Hello, I am using Solr to index and search documents in Russian. I have successfully set up the RussianAnalyzer but found that it eliminates some tokens such as numbers. I am therefore indexing my text fields in 2 ways, once with a quite literal version of the text using something similar to textTight in the example config:
<fieldtype name="text_literal" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype> And I index my fields again using the RussianAnalyzer to cover the Russian stemming and stop words: <fieldtype name="text_ru_RU" class="solr.TextField" > <analyzer class="org.apache.lucene.analysis.ru.RussianAnalyzer"/> </fieldtype> I then specify my field names: <dynamicField name="*_ru_RU" type="text_ru_RU" indexed="true" stored="false"/> <dynamicField name="*_literal" type="text_literal" indexed="true" stored="false"/> And use the copyField feature to index them twice: <copyField source="title_ru_RU" dest="title_literal" /> <copyField source="location_ru_RU" dest="location_literal" /> <copyField source="body_ru_RU" dest="body_literal" /> I then specify my own DisMaxRequestHandler in solrconfig.xml: <requestHandler name="dismax_ru_RU" class="solr.DisMaxRequestHandler" > <lst name="defaults"> <float name="tie">0.01</float> <str name="qf"> title_literal^1.5 title_ru_RU^1.3 body_literal^1.0 body_ru_RU^0.8 location_literal^0.5 location_ru_RU^0.4 </str> <str name="pf"> title_literal^1.5 title_ru_RU^1.3 body_literal^1.0 body_ru_RU^0.8 location_literal^0.5 location_ru_RU^0.4 </str> <str name="mm"> 100% </str> <int name="ps">100</int> </lst> </requestHandler> Because I am searching through classified ads, date sorting is more important to me than relevance. Therefore I am sorting by date first and then by score. I expect the system to return all matches for todays ads sorted by relevance, followed by matches for yesterday’s ads sorted by relevance etc. I would also like the search to only return ads where every single term of the query was found across my 3 fields (title, body, location). I can’t seem to get this to work. When I do a search for ‘1970’, it works fine and returns 2 ads containing 1970. If I search for ‘Ташкент’ I get 3 results incl. one with Russian stemming (Ташкента). But when I do a search for ‘1970 Ташкента’ it seems to ignore 1970 and give me the same results as only looking for ‘Ташкент’. I got it to display the debug info and 1970 seems to be ignored in the matching: <lst name="debug"> <str name="rawquerystring">"1970 Ташкент"</str> <str name="querystring">"1970 Ташкент"</str> <str name="parsedquery">+DisjunctionMaxQuery((body_ru_RU:ташкент^0.8 | body_literal:"1970 ташкент" | title_ru_RU:ташкент^1.3 | location_literal:"1970 ташкент"^0.5 | location_ru_RU:ташкент^0.4 | title_literal:"1970 ташкент"^1.5)~0.01) DisjunctionMaxQuery((body_ru_RU:ташкент^0.8 | body_literal:"1970 ташкент"~100 | title_ru_RU:ташкент^1.3 | location_literal:"1970 ташкент"~100^0.5 | location_ru_RU:ташкент^0.4 | title_literal:"1970 ташкент"~100^1.5)~0.01)</str> <str name="parsedquery_toString">+(body_ru_RU:ташкент^0.8 | body_literal:"1970 ташкент" | title_ru_RU:ташкент^1.3 | location_literal:"1970 ташкент"^0.5 | location_ru_RU:ташкент^0.4 | title_literal:"1970 ташкент"^1.5)~0.01 (body_ru_RU:ташкент^0.8 | body_literal:"1970 ташкент"~100 | title_ru_RU:ташкент^1.3 | location_literal:"1970 ташкент"~100^0.5 | location_ru_RU:ташкент^0.4 | title_literal:"1970 ташкент"~100^1.5)~0.01</str> <lst name="explain"> <str name="id=€#26;,internal_docid=4"> 0.7263521 = (MATCH) sum of: 0.36317605 = (MATCH) max plus 0.01 times others of: 0.36317605 = (MATCH) weight(location_ru_RU:ташкент^0.4 in 4), product of: 0.08076847 = queryWeight(location_ru_RU:ташкент^0.4), product of: 0.4 = boost 4.4965076 = idf(docFreq=2) 0.044906225 = queryNorm 4.4965076 = (MATCH) fieldWeight(location_ru_RU:ташкент in 4), product of: 1.0 = tf(termFreq(location_ru_RU:ташкент)=1) 4.4965076 = idf(docFreq=2) 1.0 = fieldNorm(field=location_ru_RU, doc=4) 0.36317605 = (MATCH) max plus 0.01 times others of: 0.36317605 = (MATCH) weight(location_ru_RU:ташкент^0.4 in 4), product of: 0.08076847 = queryWeight(location_ru_RU:ташкент^0.4), product of: 0.4 = boost 4.4965076 = idf(docFreq=2) 0.044906225 = queryNorm 4.4965076 = (MATCH) fieldWeight(location_ru_RU:ташкент in 4), product of: 1.0 = tf(termFreq(location_ru_RU:ташкент)=1) 4.4965076 = idf(docFreq=2) 1.0 = fieldNorm(field=location_ru_RU, doc=4) </str> <str name="id=€#26;ી,internal_docid=9"> 0.7263521 = (MATCH) sum of: 0.36317605 = (MATCH) max plus 0.01 times others of: 0.36317605 = (MATCH) weight(location_ru_RU:ташкент^0.4 in 9), product of: 0.08076847 = queryWeight(location_ru_RU:ташкент^0.4), product of: 0.4 = boost 4.4965076 = idf(docFreq=2) 0.044906225 = queryNorm 4.4965076 = (MATCH) fieldWeight(location_ru_RU:ташкент in 9), product of: 1.0 = tf(termFreq(location_ru_RU:ташкент)=1) 4.4965076 = idf(docFreq=2) 1.0 = fieldNorm(field=location_ru_RU, doc=9) 0.36317605 = (MATCH) max plus 0.01 times others of: 0.36317605 = (MATCH) weight(location_ru_RU:ташкент^0.4 in 9), product of: 0.08076847 = queryWeight(location_ru_RU:ташкент^0.4), product of: 0.4 = boost 4.4965076 = idf(docFreq=2) 0.044906225 = queryNorm 4.4965076 = (MATCH) fieldWeight(location_ru_RU:ташкент in 9), product of: 1.0 = tf(termFreq(location_ru_RU:ташкент)=1) 4.4965076 = idf(docFreq=2) 1.0 = fieldNorm(field=location_ru_RU, doc=9) </str> <str name="id=€#26;,internal_docid=2"> 0.43162674 = (MATCH) sum of: 0.21581337 = (MATCH) max plus 0.01 times others of: 0.21581337 = (MATCH) weight(body_ru_RU:ташкент^0.8 in 2), product of: 0.17610328 = queryWeight(body_ru_RU:ташкент^0.8), product of: 0.8 = boost 4.901973 = idf(docFreq=1) 0.044906225 = queryNorm 1.2254932 = (MATCH) fieldWeight(body_ru_RU:ташкент in 2), product of: 1.0 = tf(termFreq(body_ru_RU:ташкент)=1) 4.901973 = idf(docFreq=1) 0.25 = fieldNorm(field=body_ru_RU, doc=2) 0.21581337 = (MATCH) max plus 0.01 times others of: 0.21581337 = (MATCH) weight(body_ru_RU:ташкент^0.8 in 2), product of: 0.17610328 = queryWeight(body_ru_RU:ташкент^0.8), product of: 0.8 = boost 4.901973 = idf(docFreq=1) 0.044906225 = queryNorm 1.2254932 = (MATCH) fieldWeight(body_ru_RU:ташкент in 2), product of: 1.0 = tf(termFreq(body_ru_RU:ташкент)=1) 4.901973 = idf(docFreq=1) 0.25 = fieldNorm(field=body_ru_RU, doc=2) </str> </lst> Apologies for the verbosity, can anyone help me achieving my goal? Thanks Stephanie