Hi, We have a problem with searches with multiple languages. Our schema looks something like this:
____ field_en = English content for field field_es = Spanish field_it = Italian etc. ____ When a user searches for a keyword, e.g.: "brexit" it can also specify several languages s/he wants to see in the response, and the query will be performed on all the fields requested. The issue is that for 'brexit' Italian results are boosted more because something like "Brexit" is unlikely to occur in the Italian language and the idf shoots up causing less relevant but Italian docs to rank higher than the English ones. Is there some way to deal with this problem ? The current solutions we can think of: 1. Create a catchall copyfield and use that to score the docs. (But this creates problems when a word is present in another language (for eg English) and not in the resulting document language (Italian) (we will have to pay also extra disk space of the copyfield and also problems with analysis for multiple languages) 2. Create a new scorer called "IDFGroupScorer" wrapping multiple fields and computing a aggregate idf (by averaging or computing the min/max) across the fields in the group. Any thoughts on any other solutions or any suggestions on how we could possibly implement the IDFGroupScorer? Thanks, Sambhav