Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
Robert, You found it! it is the phrase slop. What do I do now? I am using Solr from trunk from December, and all those JIRA tixes are marked fixed … - Naomi Solr 1.4: luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 got result Solr 3.5 luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 NO result lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote: Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: The Beatles as musicians : Revolver through the Anthology Color-blindness [print/digital]; its dangers and its detection but this is a PHRASE search. Can you take your same phrase queries, and simply add some slop to them (e.g. ~3) and ensure they still match with the lucene queryparser? SloppyPhraseQuery has a bit of a history with repeats since Lucene 2.9 that you were using. https://issues.apache.org/jira/browse/LUCENE-3068 https://issues.apache.org/jira/browse/LUCENE-3215 https://issues.apache.org/jira/browse/LUCENE-3412 -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
Is it possible to also provide your document? If you could attach the document and the analysis config and queries to a JIRA issue, that would be most ideal. On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay ndus...@stanford.edu wrote: Robert, You found it! it is the phrase slop. What do I do now? I am using Solr from trunk from December, and all those JIRA tixes are marked fixed … - Naomi Solr 1.4: luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 got result Solr 3.5 luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 NO result lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote: Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: The Beatles as musicians : Revolver through the Anthology Color-blindness [print/digital]; its dangers and its detection but this is a PHRASE search. Can you take your same phrase queries, and simply add some slop to them (e.g. ~3) and ensure they still match with the lucene queryparser? SloppyPhraseQuery has a bit of a history with repeats since Lucene 2.9 that you were using. https://issues.apache.org/jira/browse/LUCENE-3068 https://issues.apache.org/jira/browse/LUCENE-3215 https://issues.apache.org/jira/browse/LUCENE-3412 -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html Sent from the Solr - User mailing list archive at Nabble.com. -- lucidimagination.com
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
Robert, I will create a jira issue with the documentation. FYI, I tried ps values of 3, 2, 1 and 0 and none of them worked with dismax; For lucene QueryParser, only the value of 0 got results. - Naomi On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote: Is it possible to also provide your document? If you could attach the document and the analysis config and queries to a JIRA issue, that would be most ideal. On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay [hidden email] wrote: Robert, You found it! it is the phrase slop. What do I do now? I am using Solr from trunk from December, and all those JIRA tixes are marked fixed … - Naomi Solr 1.4: luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 got result Solr 3.5 luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 NO result lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote: Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: The Beatles as musicians : Revolver through the Anthology Color-blindness [print/digital]; its dangers and its detection but this is a PHRASE search. Can you take your same phrase queries, and simply add some slop to them (e.g. ~3) and ensure they still match with the lucene queryparser? SloppyPhraseQuery has a bit of a history with repeats since Lucene 2.9 that you were using. https://issues.apache.org/jira/browse/LUCENE-3068 https://issues.apache.org/jira/browse/LUCENE-3215 https://issues.apache.org/jira/browse/LUCENE-3412 -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html Sent from the Solr - User mailing list archive at Nabble.com. -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
Please attach your docs if you dont mind. I worked up tests for this (in general for ANY phrase query, increasing the slop should never remove results, only potentially enlarge them). It fails already... but its good to also have your test case too... On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay ndus...@stanford.edu wrote: Robert, I will create a jira issue with the documentation. FYI, I tried ps values of 3, 2, 1 and 0 and none of them worked with dismax; For lucene QueryParser, only the value of 0 got results. - Naomi On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote: Is it possible to also provide your document? If you could attach the document and the analysis config and queries to a JIRA issue, that would be most ideal. On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay [hidden email] wrote: Robert, You found it! it is the phrase slop. What do I do now? I am using Solr from trunk from December, and all those JIRA tixes are marked fixed … - Naomi Solr 1.4: luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 got result Solr 3.5 luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 NO result lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote: Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: The Beatles as musicians : Revolver through the Anthology Color-blindness [print/digital]; its dangers and its detection but this is a PHRASE search. Can you take your same phrase queries, and simply add some slop to them (e.g. ~3) and ensure they still match with the lucene queryparser? SloppyPhraseQuery has a bit of a history with repeats since Lucene 2.9 that you were using. https://issues.apache.org/jira/browse/LUCENE-3068 https://issues.apache.org/jira/browse/LUCENE-3215 https://issues.apache.org/jira/browse/LUCENE-3412 -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html Sent from the Solr - User mailing list archive at Nabble.com. -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- lucidimagination.com
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
Robert - Did you mean for me to attach my docs to an existing ticket (which one?) or just want to make sure I attach the docs to the new issue? - Naomi On Feb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote: Please attach your docs if you dont mind. I worked up tests for this (in general for ANY phrase query, increasing the slop should never remove results, only potentially enlarge them). It fails already... but its good to also have your test case too... On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay [hidden email] wrote: Robert, I will create a jira issue with the documentation. FYI, I tried ps values of 3, 2, 1 and 0 and none of them worked with dismax; For lucene QueryParser, only the value of 0 got results. - Naomi On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote: Is it possible to also provide your document? If you could attach the document and the analysis config and queries to a JIRA issue, that would be most ideal. On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay [hidden email] wrote: Robert, You found it! it is the phrase slop. What do I do now? I am using Solr from trunk from December, and all those JIRA tixes are marked fixed … - Naomi Solr 1.4: luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 got result Solr 3.5 luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 NO result lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote: Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: The Beatles as musicians : Revolver through the Anthology Color-blindness [print/digital]; its dangers and its detection but this is a PHRASE search. Can you take your same phrase queries, and simply add some slop to them (e.g. ~3) and ensure they still match with the lucene queryparser? SloppyPhraseQuery has a bit of a history with repeats since Lucene 2.9 that you were using. https://issues.apache.org/jira/browse/LUCENE-3068 https://issues.apache.org/jira/browse/LUCENE-3215 https://issues.apache.org/jira/browse/LUCENE-3412 -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html Sent from the Solr - User mailing list archive at Nabble.com. -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770746.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
Please make a new one if you dont mind! On Thu, Feb 23, 2012 at 2:45 PM, Naomi Dushay ndus...@stanford.edu wrote: Robert - Did you mean for me to attach my docs to an existing ticket (which one?) or just want to make sure I attach the docs to the new issue? - Naomi On Feb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote: Please attach your docs if you dont mind. I worked up tests for this (in general for ANY phrase query, increasing the slop should never remove results, only potentially enlarge them). It fails already... but its good to also have your test case too... On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay [hidden email] wrote: Robert, I will create a jira issue with the documentation. FYI, I tried ps values of 3, 2, 1 and 0 and none of them worked with dismax; For lucene QueryParser, only the value of 0 got results. - Naomi On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote: Is it possible to also provide your document? If you could attach the document and the analysis config and queries to a JIRA issue, that would be most ideal. On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay [hidden email] wrote: Robert, You found it! it is the phrase slop. What do I do now? I am using Solr from trunk from December, and all those JIRA tixes are marked fixed … - Naomi Solr 1.4: luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 got result Solr 3.5 luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 NO result lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote: Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: The Beatles as musicians : Revolver through the Anthology Color-blindness [print/digital]; its dangers and its detection but this is a PHRASE search. Can you take your same phrase queries, and simply add some slop to them (e.g. ~3) and ensure they still match with the lucene queryparser? SloppyPhraseQuery has a bit of a history with repeats since Lucene 2.9 that you were using. https://issues.apache.org/jira/browse/LUCENE-3068 https://issues.apache.org/jira/browse/LUCENE-3215 https://issues.apache.org/jira/browse/LUCENE-3412 -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html Sent from the Solr - User mailing list archive at Nabble.com. -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770746.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- lucidimagination.com
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
Ticket created: https://issues.apache.org/jira/browse/SOLR-3158 (perhaps it's a lucene problem, not a Solr one -- feel free to move it or whatever.) - Naomi On Feb 23, 2012, at 11:55 AM, Robert Muir [via Lucene] wrote: Please make a new one if you dont mind! On Thu, Feb 23, 2012 at 2:45 PM, Naomi Dushay [hidden email] wrote: Robert - Did you mean for me to attach my docs to an existing ticket (which one?) or just want to make sure I attach the docs to the new issue? - Naomi On Feb 23, 2012, at 11:39 AM, Robert Muir [via Lucene] wrote: Please attach your docs if you dont mind. I worked up tests for this (in general for ANY phrase query, increasing the slop should never remove results, only potentially enlarge them). It fails already... but its good to also have your test case too... On Thu, Feb 23, 2012 at 2:20 PM, Naomi Dushay [hidden email] wrote: Robert, I will create a jira issue with the documentation. FYI, I tried ps values of 3, 2, 1 and 0 and none of them worked with dismax; For lucene QueryParser, only the value of 0 got results. - Naomi On Feb 23, 2012, at 11:12 AM, Robert Muir [via Lucene] wrote: Is it possible to also provide your document? If you could attach the document and the analysis config and queries to a JIRA issue, that would be most ideal. On Thu, Feb 23, 2012 at 2:05 PM, Naomi Dushay [hidden email] wrote: Robert, You found it! it is the phrase slop. What do I do now? I am using Solr from trunk from December, and all those JIRA tixes are marked fixed … - Naomi Solr 1.4: luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 got result Solr 3.5 luceneQueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology~3 final query: all_search:the beatl as musician revolv through the antholog~3 NO result lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog On Feb 22, 2012, at 7:34 PM, Robert Muir [via Lucene] wrote: On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay [hidden email] wrote: Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: The Beatles as musicians : Revolver through the Anthology Color-blindness [print/digital]; its dangers and its detection but this is a PHRASE search. Can you take your same phrase queries, and simply add some slop to them (e.g. ~3) and ensure they still match with the lucene queryparser? SloppyPhraseQuery has a bit of a history with repeats since Lucene 2.9 that you were using. https://issues.apache.org/jira/browse/LUCENE-3068 https://issues.apache.org/jira/browse/LUCENE-3215 https://issues.apache.org/jira/browse/LUCENE-3412 -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768619.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770665.html Sent from the Solr - User mailing list archive at Nabble.com. -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770681.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770746.html To unsubscribe from result present in Solr 1.4, but missing in Solr 3.5, dismax only, click here. NAML -- lucidimagination.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3770786.html To unsubscribe from result present
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
So I don't really know what I'm talking about, and I'm not really sure if it's related or not, but your particular query: The Beatles as musicians : Revolver through the Anthology With the lone word that's a ':', reminds me of a dismax stopwords-type problem I ran into. Now, I ran into it on 1.4. I don't know why it would be different on 1.4 and 3.x. And I see you aren't even using a multi-field dismax in your sample query, so it couldn't possibly be what I ran into... I don't think. But I'll write this anyway in case it gives someone some ideas. The problem I ran into is caused by different analysis in two fields both used in a dismax, one that ends up keeping : as a token, and one that doesn't. Which ends up having the same effect as the famous 'dismax stopwords problem'. Maybe somehow your schema changed such to produce this problem in 3.x but not in 1.4? Although again I realize the fact that you are only using a single field in your demo dismax query kind of suggests it's not this problem. Wonder if you try the query without the :, if the problem goes away, that might be a hint. Or, maybe someone more skilled at understanding what's in those Solr debug statements than I am (it's kind of all greek to me) will be able to take this hint and rule out or confirm that it may have something to do with your problem. Here I write up the issue I ran into (which may or may not have anything to do with what you ran into) http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/ Also, you don't say what your 'mm' is in your dismax queries, that could be relevant if it's got anything to do with anything similar to the issue I'm talking about. Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens for 'mm' in such a way that the 'varying field analysis dismax gotcha' can manifest with only one field, if the way dismax counts tokens for 'mm' differs from number of tokens the single field's analysis produces? Jonathan On 2/22/2012 2:55 PM, Naomi Dushay wrote: I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem. I have a test checking for a search result in Solr, and the test passes in Solr 1.4, but fails in Solr 3.5. Dismax is the desired QueryParser -- I just included output from lucene QueryParser to prove the document exists and is found I am completely stumped. Here are the debugQuery details: ***Solr 3.5*** lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog 6.0562754 = (MATCH) weight(all_search:the beatl as musician revolv through the antholog in 1064395), product of: 1.0 = queryWeight(all_search:the beatl as musician revolv through the antholog), product of: 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611) 0.02063975 = queryNorm 6.0562754 = fieldWeight(all_search:the beatl as musician revolv through the antholog in 1064395), product of: 1.0 = tf(phraseFreq=1.0) 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611) 0.125 = fieldNorm(field=all_search, doc=1064395) dismax QueryParser: URL: qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver through the Anthology final query: +(all_search:the beatl as musician revolv through the antholog~1)~0.01 (all_search:the beatl as musician revolv through the antholog~3)~0.01 (no matches) ***Solr 1.4*** lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog 5.2676983 = fieldWeight(all_search:the beatl as musician revolv through the antholog in 3469163), product of: 1.0 = tf(phraseFreq=1.0) 48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205) 0.109375 = fieldNorm(field=all_search, doc=3469163) dismax QueryParser: URL: qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver through the Anthology final query: +(all_search:the beatl as musician revolv through the antholog~1)~0.01 (all_search:the beatl as musician revolv through the antholog~3)~0.01 score: 7.449651 = (MATCH) sum of: 3.7248254 = weight(all_search:the beatl as musician revolv through the antholog~1 in 3469163), product of: 0.7071068 = queryWeight(all_search:the beatl as musician revolv through the antholog~1), product of: 48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205) 0.014681898 = queryNorm 5.2676983 = fieldWeight(all_search:the beatl as musician revolv through the antholog in 3469163), product of: 1.0 = tf(phraseFreq=1.0)
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
I forgot to include the field definition information: schema.xml: field name=all_search type=text indexed=true stored=false / solr 3.5: fieldtype name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.ICUFoldingFilterFactory/ filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 catenateWords=1 splitOnNumerics=0 generateNumberParts=1 catenateNumbers=1 catenateAll=0 preserveOriginal=0 stemEnglishPossessive=1 / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldtype solr1.4: fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory / filter class=schema.UnicodeNormalizationFilterFactory version=icu4j composed=false remove_diacritics=true remove_modifiers=true fold=true / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 generateWordParts=1 catenateWords=1 splitOnNumerics=0 generateNumberParts=1 catenateNumbers=1 catenateAll=0 preserveOriginal=0 stemEnglishPossessive=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldtype And the analysis page shows the same results for Solr 3.5 and 1.4 Solr 3.5: position1 2 3 4 5 6 7 8 term text the beatl as musicianrevolv through the antholog keyword false false false false false false false false startOffset 0 4 12 15 27 36 44 48 endOffset 3 11 14 24 35 43 47 57 typewordwordwordwordwordwordwordword Solr 1.4: term position 1 2 3 4 5 6 7 8 term text the beatl as musicianrevolv through the antholog term type wordwordwordwordwordwordwordword source start,end0,3 4,1112,14 15,24 27,35 36,43 44,47 48,57 - Naomi -- View this message in context: http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768007.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
Jonathan, I have the same problem without the colon - I tested that, but didn't mention it. mm can't be the issue either: in Solr 3.5, if I remove one of the occurrences of the (doesn't matter which), I get results. Removing any other word does NOT get results. And if the query isn't a phrase query, it gets results. And no, it can't be related to what you refer to as the dismax stopwords problem, since i can demonstrate the problem with a single field. mm can't be the issue I have run into problems in the past with a non-alpha character surrounded by spaces tanking my search results for dismax … but I fixed that with this fieldType: !-- single token with punctuation terms removed so dismax doesn't look for punctuation terms in these fields -- !-- On client side, Lucene query parser breaks things up by whitespace *before* field analysis for dismax -- !-- so punctuation terms ( : ;) are stopwords to allow results from other fields when these chars are surrounded by spaces in query -- !-- do not lowercase -- fieldType name=string_punct_stop class=solr.TextField omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.ICUNormalizer2FilterFactory name=nfkc mode=compose / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.ICUNormalizer2FilterFactory name=nfkc mode=compose / !-- removing punctuation for Lucene query parser issues -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_punctuation.txt enablePositionIncrements=true / /analyzer /fieldType My stopwords_punctuation.txt file is #Punctuation characters we want to ignore in queries : ; / and used this type instead of string for fields in my dismax qf.Thus, the punctuation terms in the query are not present for the fields that were formerly string fields. - Naomi On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote: So I don't really know what I'm talking about, and I'm not really sure if it's related or not, but your particular query: The Beatles as musicians : Revolver through the Anthology With the lone word that's a ':', reminds me of a dismax stopwords-type problem I ran into. Now, I ran into it on 1.4. I don't know why it would be different on 1.4 and 3.x. And I see you aren't even using a multi-field dismax in your sample query, so it couldn't possibly be what I ran into... I don't think. But I'll write this anyway in case it gives someone some ideas. The problem I ran into is caused by different analysis in two fields both used in a dismax, one that ends up keeping : as a token, and one that doesn't. Which ends up having the same effect as the famous 'dismax stopwords problem'. Maybe somehow your schema changed such to produce this problem in 3.x but not in 1.4? Although again I realize the fact that you are only using a single field in your demo dismax query kind of suggests it's not this problem. Wonder if you try the query without the :, if the problem goes away, that might be a hint. Or, maybe someone more skilled at understanding what's in those Solr debug statements than I am (it's kind of all greek to me) will be able to take this hint and rule out or confirm that it may have something to do with your problem. Here I write up the issue I ran into (which may or may not have anything to do with what you ran into) http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/ Also, you don't say what your 'mm' is in your dismax queries, that could be relevant if it's got anything to do with anything similar to the issue I'm talking about. Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens for 'mm' in such a way that the 'varying field analysis dismax gotcha' can manifest with only one field, if the way dismax counts tokens for 'mm' differs from number of tokens the single field's analysis produces? Jonathan On 2/22/2012 2:55 PM, Naomi Dushay wrote: I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem. I have a test checking for a search result in Solr, and the test passes in Solr 1.4, but fails in Solr 3.5. Dismax is the desired QueryParser -- I just included output from lucene QueryParser to prove the document exists and is found I am completely stumped. Here are the debugQuery details: ***Solr 3.5*** lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog 6.0562754 = (MATCH) weight(all_search:the beatl as musician revolv through the antholog in 1064395), product of: 1.0 = queryWeight(all_search:the beatl as musician revolv through the antholog), product of: 48.450203 = idf(all_search:
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: The Beatles as musicians : Revolver through the Anthology Color-blindness [print/digital]; its dangers and its detection but this is a PHRASE search. In case it's relevant, both Solr 1.4 and Solr 3.5: do NOT use stopwords in the fieldtype; mm is 6-1 690% for dismax qs is 1 ps is 3 And both use this filter last filter class=solr.RemoveDuplicatesTokenFilterFactory / … but I believe that filter is only used for consecutive tokens. Lastly, Color-blindness [print/digital]; its and its detection works (danger is removed, rather than one of the repeated its) - Naomi On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote: So I don't really know what I'm talking about, and I'm not really sure if it's related or not, but your particular query: The Beatles as musicians : Revolver through the Anthology With the lone word that's a ':', reminds me of a dismax stopwords-type problem I ran into. Now, I ran into it on 1.4. I don't know why it would be different on 1.4 and 3.x. And I see you aren't even using a multi-field dismax in your sample query, so it couldn't possibly be what I ran into... I don't think. But I'll write this anyway in case it gives someone some ideas. The problem I ran into is caused by different analysis in two fields both used in a dismax, one that ends up keeping : as a token, and one that doesn't. Which ends up having the same effect as the famous 'dismax stopwords problem'. Maybe somehow your schema changed such to produce this problem in 3.x but not in 1.4? Although again I realize the fact that you are only using a single field in your demo dismax query kind of suggests it's not this problem. Wonder if you try the query without the :, if the problem goes away, that might be a hint. Or, maybe someone more skilled at understanding what's in those Solr debug statements than I am (it's kind of all greek to me) will be able to take this hint and rule out or confirm that it may have something to do with your problem. Here I write up the issue I ran into (which may or may not have anything to do with what you ran into) http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/ Also, you don't say what your 'mm' is in your dismax queries, that could be relevant if it's got anything to do with anything similar to the issue I'm talking about. Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens for 'mm' in such a way that the 'varying field analysis dismax gotcha' can manifest with only one field, if the way dismax counts tokens for 'mm' differs from number of tokens the single field's analysis produces? Jonathan On 2/22/2012 2:55 PM, Naomi Dushay wrote: I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem. I have a test checking for a search result in Solr, and the test passes in Solr 1.4, but fails in Solr 3.5. Dismax is the desired QueryParser -- I just included output from lucene QueryParser to prove the document exists and is found I am completely stumped. Here are the debugQuery details: ***Solr 3.5*** lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog 6.0562754 = (MATCH) weight(all_search:the beatl as musician revolv through the antholog in 1064395), product of: 1.0 = queryWeight(all_search:the beatl as musician revolv through the antholog), product of: 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611) 0.02063975 = queryNorm 6.0562754 = fieldWeight(all_search:the beatl as musician revolv through the antholog in 1064395), product of: 1.0 = tf(phraseFreq=1.0) 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 revolv=872 through=81366 the=3531140 antholog=11611) 0.125 = fieldNorm(field=all_search, doc=1064395) dismax QueryParser: URL: qf=all_searchpf=all_searchq=The Beatles as musicians : Revolver through the Anthology final query: +(all_search:the beatl as musician revolv through the antholog~1)~0.01 (all_search:the beatl as musician revolv through the antholog~3)~0.01 (no matches) ***Solr 1.4*** lucene QueryParser: URL: q=all_search:The Beatles as musicians : Revolver through the Anthology final query: all_search:the beatl as musician revolv through the antholog 5.2676983 = fieldWeight(all_search:the beatl as musician revolv through the antholog in 3469163), product of: 1.0 = tf(phraseFreq=1.0) 48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 revolv=820 through=88238 the=3542123 antholog=11205) 0.109375 =
Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only
On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay ndus...@stanford.edu wrote: Jonathan has brought it to my attention that BOTH of my failing searches happen to have 8 terms, and one of the terms is repeated: The Beatles as musicians : Revolver through the Anthology Color-blindness [print/digital]; its dangers and its detection but this is a PHRASE search. Can you take your same phrase queries, and simply add some slop to them (e.g. ~3) and ensure they still match with the lucene queryparser? SloppyPhraseQuery has a bit of a history with repeats since Lucene 2.9 that you were using. https://issues.apache.org/jira/browse/LUCENE-3068 https://issues.apache.org/jira/browse/LUCENE-3215 https://issues.apache.org/jira/browse/LUCENE-3412 -- lucidimagination.com