To do just what Jack described, I often write a solr query component that does 
"query expansion".
Based on some parameters I can recognize to be a language hint (e.g. the 
language of the environment they search in, the browser's accept-language) I 
reformulate the query into a query in the fields in these languages in a 
preference order.

I am sure that doing this produces some noise. E.g. because the search corpus 
is not uniformly spread, but… I have to accept it.

There are many other example's than the fine "raison d'être" example of Jack (I 
like particularly the way he describes the motivation to using it, I almost 
hear people trying to carefully articulate this! ;-)).
Other examples of language cross-use include the "gallicisms" e.g. in German: 
http://de.wikipedia.org/wiki/Liste_von_Gallizismen or other languages linked 
there.

E.g. "direction" which has a different meanings in French (where it can mean 
the management staff) and in English (where it can mean the teacher's 
instruction), "demonstration" too, "sitting" (which is an english word used in 
French). 


paul

On 4 juil. 2014, at 17:15, "Jack Krupansky" <j...@basetechnology.com> wrote:

> What leads you to believe that the user is not interested in occurrences of 
> the French phrase in English text? I mean, we English-speakers and writers 
> like to use French phrases to show how sophisticated we are! It's part of 
> our... raison d'être. If I do a Google search for "raison d'être", it doesn't 
> mysteriously show me only French documents.
> 
> So, usually, it needs to be a user preference - the user's preferred 
> language, and whether they want to search across documents in all languages 
> or just a subset of languages. And then, on the results page you can show the 
> language and a button to restrict a re-query to the specific language.
> 
> If you really need to do this query language detection, the best approach is 
> to do it within your application layer (you can use the Google code for 
> language detection) and then send the query to the appropriate query request 
> handler, with a separate query request handler for each language that 
> optimizes the settings for that language, such as the language-specific 
> fields to use for the "qf" parameter.
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: benjelloun
> Sent: Friday, July 4, 2014 10:52 AM
> To: solr-user@lucene.apache.org
> Subject: multilingual search
> 
> Hello,
> 
> what i need to do is to detect language of my fields then when i search with
> "/select  RequestHandler"
> how can i define for a search to detect the language of words to choose
> which field_langid use.
> 
> my conf:
> 
> <updateRequestProcessorChain name="langid">
>      <processor
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>  <lst name="defaults">
>    <bool name="langid">true</bool>
>    <str name="langid.fl">NomDocument,ContenuDocument,Postit,
> </str>
>        <str name="langid.langField">language_s</str>
>        <str name="langid.whitelist">en,fr,ar</str>
>        <str name="langid.fallback">fr</str>
>        <float name="langid.threshold">0.6</float>
>        <bool name="langid.map">true</bool>
>        <bool name="langid.map.individual">true</bool>
>        <bool name="langid.map.keepOrig">true</bool>
> 
>  </lst>
> </processor>
> 
> <field name="AllChamp_ar" type="text_ar" multiValued="true" indexed="true"
> required="false" stored="false"/>
> <field name="AllChamp_fr" type="text_fr" multiValued="true" indexed="true"
> required="false" stored="false"/>
> <field name="AllChamp_en" type="text_en" multiValued="true" indexed="true"
> required="false" stored="false"/>
> 
> <dynamicField name="*_en" type="text_en" indexed="true" stored="false"
> required="false" multiValued="true"/>
> <dynamicField name="*_fr" type="text_fr" indexed="true" stored="false"
> required="false" multiValued="true"/>
> <dynamicField name="*_ar" type="text_ar" indexed="true" stored="false"
> required="false" multiValued="true"/>
> 
> <copyField source="*_ar" dest="AllChamp_ar"/>
> <copyField source="*_fr" dest="AllChamp_fr"/>
> <copyField source="*_en" dest="AllChamp_en"/>
> 
> <requestHandler name="/select" class="solr.SearchHandler">
>    <lst name="defaults">
>      <str name="echoParams">explicit</str>
>      <int name="rows">10</int>
>  <str name="defType">edismax</str>
>      <str name="qf">
>  AllChamp^2.0 AllChamp_ar^2.0 AllChamp_en^2.0 AllChamp_fr^5.0
>  </str>
>    </lst>
> </requestHandler>
> 
> exemple for search in Solr Admin:  "nous présentons" it is frensh language.
> and "nous" is a stopwords_fr.
> but when i search for "nous présontons" i find nous becaus i have some
> english docs which contain "nous".
> 
> this is just one exemple for on language. i dont want to add stopwords_fr in
> stopwords_en.
> what i want is to detect the language before the select search then choose
> the field_langid for search.
> 
> Best regards,
> Anass BENJELLOUN
> 
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/multilingual-search-tp4145639.html
> Sent from the Solr - User mailing list archive at Nabble.com. 

Reply via email to