Hi Solrists, 

thank you for your kind responses. 
Grant, François, I'll keep your advice in mind & your links in store; 
they may be useful in one of my use cases, 
even though I doubt they might in the primary one. 
As RDF litterals, my documents are affixed with language tags @en, @fr, @ja &c, 
as per ISO 639-1, so that language identification is straightforward. 
It's the discovery of the corresponding Analyzer subclasses & constructors
I'm trying to automate. 
 
Solr looks like it's up to the server admins to specify in XML 
what Analyzer subclasses they want in a given case, 
then it's up to Solr to instantiate those subclasses by Java reflection. 
I would like to spare myself the burden to write & maintain this XML. 
 
Rather, I'd use Java code to build the mapping 
by inventorying the classpath, with rules like 
"on finding jarentry /whats/this/package/analysis/xx/WhatsThisAnalyzer.class, 
if class WhatsThisAnalyzer is a subclass of lucene.analysis.Analyzer, 
if reflection reveals a public new WhatsThisAnalyzer(lucene.util.Version), 
if instantiation succeeds,
then the instance is the presumptive default analyzer for ISO 639-1 code xx". 
 
Might make a Lucene submission, more properly than a Solr one. 
 
Thanks again for your time & your help.
Best regards,
     François Jurain.

> Message du 25/03/11  à 23h06
> De : "François Schiettecatte" <fschietteca...@gmail.com>
> A : solr-user@lucene.apache.org
> Copie à : 
> Objet : Re: Wanted: a directory of quick-and-(not too)dirty analyzers for 
> multi-language RDF.
> 
> 
> François
> 
> I think there is a language identification tool in the Nutch code base, 
> otherwise I have written one in Perl which could easily be translated to 
> Java. I wont have access to it for 10 days (I am traveling), but I am happy 
> to send you a link to it when I get back (and anyone else who wants it).
> 
> Cheers
> 
> François
> 
> On Mar 25, 2011, at 11:59 AM, Grant Ingersoll wrote:
> 
> > You are looking for a language identification tool.  You could check 
> > https://issues.apache.org/jira/browse/SOLR-1979 for the start of this.  
> > Otherwise, you have to roll your own or buy a third party one.
> > 
> > On Mar 24, 2011, at 12:24 PM, fr.jur...@voila.fr wrote:
> > 
> >> Hello Solrists,
> >> 
> >> As it says in the subject line, I'm looking for a Java component that,
> >> given an ISO 639-1 code or some equivalent,
> >> would return a Lucene Analyzer ready to gobble documents in the 
> >> corresponding language.
> >> Solr looks like it has to contain one,
> >> only I've not been able to locate it so far; 
> >> can you point the spot?
> >> 
> >> I've found org.apache.solr.analysis,
> >> and thing like org.apache.lucene.analysis.bg &c in lucene/modules,
> >> with many classes which I'm sure are related, however the factory itself 
> >> still eludes me;
> >> I mean the Java class.method that'd decide on request, what to do with all 
> >> these packages
> >> to bring the requisite object to existence, once the language is specified.
> >> Where should I look? Or was I mistaken & Solr has nothing of the kind, at 
> >> least in Java?
> >> Thanks in advance for your help.
> >> 
> >> Best regards,
> >>   François Jurain.
> >> 
> >> ____________________________________________________
> >> 
> >> Retrouvez les 10 conseils pour économiser votre carburant sur Voila :  
> >> http://actu.voila.fr/evenementiel/LeDossierEcologie/l-eco-conduite/
> >> 
> >> 
> >> 
> > 
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> > 
> > Search the Lucene ecosystem docs using Solr/Lucene:
> > http://www.lucidimagination.com/search
> > 
> 
> 
>

____________________________________________________

  Suivez toute l'actualité en photos de l'émission Carré Viiip et retrouvez les 
derniers échanges des viiip sur : 
http://people.voila.fr/evenementiel/carre-viiip



Reply via email to