François

I think there is a language identification tool in the Nutch code base, 
otherwise I have written one in Perl which could easily be translated to Java. 
I wont have access to it for 10 days (I am traveling), but I am happy to send 
you a link to it when I get back (and anyone else who wants it).

Cheers

François

On Mar 25, 2011, at 11:59 AM, Grant Ingersoll wrote:

> You are looking for a language identification tool.  You could check 
> https://issues.apache.org/jira/browse/SOLR-1979 for the start of this.  
> Otherwise, you have to roll your own or buy a third party one.
> 
> On Mar 24, 2011, at 12:24 PM, fr.jur...@voila.fr wrote:
> 
>> Hello Solrists,
>> 
>> As it says in the subject line, I'm looking for a Java component that,
>> given an ISO 639-1 code or some equivalent,
>> would return a Lucene Analyzer ready to gobble documents in the 
>> corresponding language.
>> Solr looks like it has to contain one,
>> only I've not been able to locate it so far; 
>> can you point the spot?
>> 
>> I've found org.apache.solr.analysis,
>> and thing like org.apache.lucene.analysis.bg &c in lucene/modules,
>> with many classes which I'm sure are related, however the factory itself 
>> still eludes me;
>> I mean the Java class.method that'd decide on request, what to do with all 
>> these packages
>> to bring the requisite object to existence, once the language is specified.
>> Where should I look? Or was I mistaken & Solr has nothing of the kind, at 
>> least in Java?
>> Thanks in advance for your help.
>> 
>> Best regards,
>>   François Jurain.
>> 
>> ____________________________________________________
>> 
>> Retrouvez les 10 conseils pour économiser votre carburant sur Voila :  
>> http://actu.voila.fr/evenementiel/LeDossierEcologie/l-eco-conduite/
>> 
>> 
>> 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search
> 

Reply via email to