SOLR 4.x vs 3.x parsedquery differences
I'm migrating from 3.x to 4.x and I'm running some queries to verify that everything works like before. I've found however that the query galaxy s3 is giving much less results. In 3.x numFound=1628, in 4.x numFound=70. Here's the relevant schema part: fieldtype name=text_pt class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=- replacement=IIIHYPHENIII/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=IIIHYPHENIII replacement=-/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 preserveOriginal=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=false words=portugueseStopWords.txt/ filter class=solr.BrazilianStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query charFilter class=solr.PatternReplaceCharFilterFactory pattern=- replacement=IIIHYPHENIII/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=IIIHYPHENIII replacement=-/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.SynonymFilterFactory ignoreCase=true synonyms=portugueseSynonyms.txt expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 preserveOriginal=1 catenateNumbers=0 catenateAll=0 protected=protwords.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=false words=portugueseStopWords.txt/ filter class=solr.BrazilianStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer/fieldtype The synonyms involved in this query are: siii, s3 galaxy, galax My default search operator is AND (in both versions, even if it's deprecated in 4.x), and the output of the debug is: SOLR 3.x str name=parsedquery+(title_search_pt:galaxy title_search_pt:galax) +MultiPhraseQuery(title_search_pt:(sii s3 s) 3)/str SOLR 4.x str name=parsedquery+((title_search_pt:galaxy title_search_pt:galax)/no_coord) +(+title_search_pt:sii +title_search_pt:s3 +title_search_pt:s +title_search_pt:3)/str The weird thing is that it does not return results like 'galaxy s3'. This is the debug query: no match on required clause (+title_search_pt:sii +title_search_pt:s3 +title_search_pt:s +title_search_pt:3) (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s), *no match on required clause (title_search_pt:sii)* (NON-MATCH) no matching term (MATCH) weight(title_search_pt:s3 in 1834535) (MATCH) weight(title_search_pt:s in 1834535) (MATCH) weight(title_search_pt:3 in 1834535) How is that sii is *required* when it should be OR'ed with s and s3 ? The analysis output shows that sii has token position 2, like it's synonyms, like so: galaxy sii 3 galax s3 s Thanks, Raúl Cardozo.
Re: SOLR 4.x vs 3.x parsedquery differences
: I'm migrating from 3.x to 4.x and I'm running some queries to verify that : everything works like before. I've found however that the query galaxy s3 : is giving much less results. In 3.x numFound=1628, in 4.x numFound=70. is your entire schema 100% identical in both cases? what is the luceneMatchVersion set to in your solrconfig.xml? By the looks of your debug output, it appears that you are using autoGeneratePhraseQueries=true in 3x, but have it set to false in 4x -- but the fieldType you posted here shows it set to false : fieldtype name=text_pt class=solr.TextField : positionIncrementGap=100 autoGeneratePhraseQueries=false ...i haven't tried to reproduce your specific situation, but that configuration doesn't smell right compared with what you are showing for the 3x output... : SOLR 3.x : : str name=parsedquery+(title_search_pt:galaxy : title_search_pt:galax) +MultiPhraseQuery(title_search_pt:(sii s3 s) : 3)/str : : SOLR 4.x : : str name=parsedquery+((title_search_pt:galaxy : title_search_pt:galax)/no_coord) +(+title_search_pt:sii : +title_search_pt:s3 +title_search_pt:s +title_search_pt:3)/str -Hoss
Re: SOLR 4.x vs 3.x parsedquery differences
Besides liking or not the behaviour we are getting in 3.x, Im required to keep everything working as close as possible as before. Have no idea why this is happening, but setting that field to true solved the issue, now I get the exact same amount of items in both queries! I wouldn't bother checking why that was so since we'll be moving away from the older version, which shows the inconsistency. But thanks a million. If you have a SO user I can mark yours as answer here: http://stackoverflow.com/questions/18661996/solr-4-x-vs-3-x-parsedquery-differences Cheers On Sep 6, 2013 4:15 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Our schema is identical except the version. : In 3.x it's 1.1 and in 4.x it's 1.5. That's kind of a significant difference to leave out -- indepenent of the question you are asking about here, it's going to make quite a few differences in how things are being being parsed, and what defaults are. If i'm understanding correctly: you like the behavior you are getting from Solr 3.x where phrases are generated automatically for you. what i can't understand, is how/why phrases are being generated automatically for you if you have that 'autoGeneratePhraseQueries=false' on your fieldType in your 3x schema ... that makes no sense to me. if you didn't have autoGeneratePhraseQueries specified at all, then the 'version=1.1' would explain it (up to version=1.3, the default for autoGeneratePhraseQueries was true, but in version=1.4 and above, it defaults to false) but with an explicit 'autoGeneratePhraseQueries=false' i can't explain why 3x works the way you say it works for you. Bottom line: if you *want* the auto generated phrase query behavior in 4.x, you should just set 'autoGeneratePhraseQueries=true' on your fieldType. : : I'm migrating from 3.x to 4.x and I'm running some queries to verify that : : everything works like before. I've found however that the query galaxy : s3 : : is giving much less results. In 3.x numFound=1628, in 4.x numFound=70. : : is your entire schema 100% identical in both cases? : what is the luceneMatchVersion set to in your solrconfig.xml? : : : By the looks of your debug output, it appears that you are using : autoGeneratePhraseQueries=true in 3x, but have it set to false in 4x -- : but the fieldType you posted here shows it set to false : : : fieldtype name=text_pt class=solr.TextField : : positionIncrementGap=100 autoGeneratePhraseQueries=false : : ...i haven't tried to reproduce your specific situation, but that : configuration doesn't smell right compared with what you are showing for : the 3x output... : : : SOLR 3.x : : : : str name=parsedquery+(title_search_pt:galaxy : : title_search_pt:galax) +MultiPhraseQuery(title_search_pt:(sii s3 s) : : 3)/str : : : : SOLR 4.x : : : : str name=parsedquery+((title_search_pt:galaxy : : title_search_pt:galax)/no_coord) +(+title_search_pt:sii : : +title_search_pt:s3 +title_search_pt:s +title_search_pt:3)/str : : : -Hoss : : -Hoss
Re: SOLR 4.x vs 3.x parsedquery differences
: Our schema is identical except the version. : In 3.x it's 1.1 and in 4.x it's 1.5. That's kind of a significant difference to leave out -- indepenent of the question you are asking about here, it's going to make quite a few differences in how things are being being parsed, and what defaults are. If i'm understanding correctly: you like the behavior you are getting from Solr 3.x where phrases are generated automatically for you. what i can't understand, is how/why phrases are being generated automatically for you if you have that 'autoGeneratePhraseQueries=false' on your fieldType in your 3x schema ... that makes no sense to me. if you didn't have autoGeneratePhraseQueries specified at all, then the 'version=1.1' would explain it (up to version=1.3, the default for autoGeneratePhraseQueries was true, but in version=1.4 and above, it defaults to false) but with an explicit 'autoGeneratePhraseQueries=false' i can't explain why 3x works the way you say it works for you. Bottom line: if you *want* the auto generated phrase query behavior in 4.x, you should just set 'autoGeneratePhraseQueries=true' on your fieldType. : : I'm migrating from 3.x to 4.x and I'm running some queries to verify that : : everything works like before. I've found however that the query galaxy : s3 : : is giving much less results. In 3.x numFound=1628, in 4.x numFound=70. : : is your entire schema 100% identical in both cases? : what is the luceneMatchVersion set to in your solrconfig.xml? : : : By the looks of your debug output, it appears that you are using : autoGeneratePhraseQueries=true in 3x, but have it set to false in 4x -- : but the fieldType you posted here shows it set to false : : : fieldtype name=text_pt class=solr.TextField : : positionIncrementGap=100 autoGeneratePhraseQueries=false : : ...i haven't tried to reproduce your specific situation, but that : configuration doesn't smell right compared with what you are showing for : the 3x output... : : : SOLR 3.x : : : : str name=parsedquery+(title_search_pt:galaxy : : title_search_pt:galax) +MultiPhraseQuery(title_search_pt:(sii s3 s) : : 3)/str : : : : SOLR 4.x : : : : str name=parsedquery+((title_search_pt:galaxy : : title_search_pt:galax)/no_coord) +(+title_search_pt:sii : : +title_search_pt:s3 +title_search_pt:s +title_search_pt:3)/str : : : -Hoss : : -Hoss
Re: SOLR 4.x vs 3.x parsedquery differences
On 9/6/2013 12:46 PM, Fermin Silva wrote: Our schema is identical except the version. In 3.x it's 1.1 and in 4.x it's 1.5. Also in solrconfig.xml we have no lucene version for 3.x (so it's using 2_4 i believe) and in 4.x we fixed it to 4_4. The autoGeneratePhraseQueries parameter didn't exist before schema version 1.4. I'm fairly sure that for your schema that is at version 1.1, the autoGeneratePhraseQueries value specified in the field definition will be ignored and the actual value that gets used will be true, which goes along with what Hoss has said. See the comment about the version in the example schema on any 4.x Solr download. Thanks, Shawn
Re: SOLR 4.x vs 3.x parsedquery differences
Hi, Our schema is identical except the version. In 3.x it's 1.1 and in 4.x it's 1.5. Also in solrconfig.xml we have no lucene version for 3.x (so it's using 2_4 i believe) and in 4.x we fixed it to 4_4. Thanks On Sep 6, 2013 3:34 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I'm migrating from 3.x to 4.x and I'm running some queries to verify that : everything works like before. I've found however that the query galaxy s3 : is giving much less results. In 3.x numFound=1628, in 4.x numFound=70. is your entire schema 100% identical in both cases? what is the luceneMatchVersion set to in your solrconfig.xml? By the looks of your debug output, it appears that you are using autoGeneratePhraseQueries=true in 3x, but have it set to false in 4x -- but the fieldType you posted here shows it set to false : fieldtype name=text_pt class=solr.TextField : positionIncrementGap=100 autoGeneratePhraseQueries=false ...i haven't tried to reproduce your specific situation, but that configuration doesn't smell right compared with what you are showing for the 3x output... : SOLR 3.x : : str name=parsedquery+(title_search_pt:galaxy : title_search_pt:galax) +MultiPhraseQuery(title_search_pt:(sii s3 s) : 3)/str : : SOLR 4.x : : str name=parsedquery+((title_search_pt:galaxy : title_search_pt:galax)/no_coord) +(+title_search_pt:sii : +title_search_pt:s3 +title_search_pt:s +title_search_pt:3)/str -Hoss