Much more efficient to tag documents with language at index time. Look for language identification tools such as http://www.sematext.com/products/language-identifier/index.html or http://ngramj.sourceforge.net/ or http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/LanguageIdentifier.html
-- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 9. feb. 2010, at 05.19, Lance Norskog wrote: > There is > > On Thu, Feb 4, 2010 at 10:07 AM, Raimon Bosch <raimon.bo...@gmail.com> wrote: >> >> >> Yes, It's true that we could do it in index time if we had a way to know. I >> was thinking in some solution in search time, maybe measuring the % of >> stopwords of each document. Normally, a document of another language won't >> have any stopword of its main language. >> >> If you know some external software to detect the language of a source text, >> it would be useful too. >> >> Thanks, >> Raimon Bosch. >> >> >> >> Ahmet Arslan wrote: >>> >>> >>>> In our indexes, sometimes we have some documents written in >>>> other languages >>>> different to the most common index's language. Is there any >>>> way to give less >>>> boosting to this documents? >>> >>> If you are aware of those documents, at index time you can boost those >>> documents with a value less than 1.0: >>> >>> <add> >>> <doc boost="0.5"> >>> // document written in other languages >>> <field name="...">...</field> >>> <field name="...">...</field> >>> </doc> >>> </add> >>> >>> http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_on_.22doc.22 >>> >>> >>> >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Is-it-posible-to-exclude-results-from-other-languages--tp27455759p27457165.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Lance Norskog > goks...@gmail.com