Much more efficient to tag documents with language at index time. Look for 
language identification tools such as 
http://www.sematext.com/products/language-identifier/index.html or 
http://ngramj.sourceforge.net/ or 
http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/LanguageIdentifier.html

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 9. feb. 2010, at 05.19, Lance Norskog wrote:

> There is
> 
> On Thu, Feb 4, 2010 at 10:07 AM, Raimon Bosch <raimon.bo...@gmail.com> wrote:
>> 
>> 
>> Yes, It's true that we could do it in index time if we had a way to know. I
>> was thinking in some solution in search time, maybe measuring the % of
>> stopwords of each document. Normally, a document of another language won't
>> have any stopword of its main language.
>> 
>> If you know some external software to detect the language of a source text,
>> it would be useful too.
>> 
>> Thanks,
>> Raimon Bosch.
>> 
>> 
>> 
>> Ahmet Arslan wrote:
>>> 
>>> 
>>>> In our indexes, sometimes we have some documents written in
>>>> other languages
>>>> different to the most common index's language. Is there any
>>>> way to give less
>>>> boosting to this documents?
>>> 
>>> If you are aware of those documents, at index time you can boost those
>>> documents with a value less than 1.0:
>>> 
>>> <add>
>>>   <doc boost="0.5">
>>>     // document written in other languages
>>>     <field name="...">...</field>
>>>     <field name="...">...</field>
>>>   </doc>
>>> </add>
>>> 
>>> http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_on_.22doc.22
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> --
>> View this message in context: 
>> http://old.nabble.com/Is-it-posible-to-exclude-results-from-other-languages--tp27455759p27457165.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com

Reply via email to