Hi Solrists, thank you for your kind responses. Grant, François, I'll keep your advice in mind & your links in store; they may be useful in one of my use cases, even though I doubt they might in the primary one. As RDF litterals, my documents are affixed with language tags @en, @fr, @ja &c, as per ISO 639-1, so that language identification is straightforward. It's the discovery of the corresponding Analyzer subclasses & constructors I'm trying to automate. Solr looks like it's up to the server admins to specify in XML what Analyzer subclasses they want in a given case, then it's up to Solr to instantiate those subclasses by Java reflection. I would like to spare myself the burden to write & maintain this XML. Rather, I'd use Java code to build the mapping by inventorying the classpath, with rules like "on finding jarentry /whats/this/package/analysis/xx/WhatsThisAnalyzer.class, if class WhatsThisAnalyzer is a subclass of lucene.analysis.Analyzer, if reflection reveals a public new WhatsThisAnalyzer(lucene.util.Version), if instantiation succeeds, then the instance is the presumptive default analyzer for ISO 639-1 code xx". Might make a Lucene submission, more properly than a Solr one. Thanks again for your time & your help. Best regards, François Jurain.
> Message du 25/03/11 à 23h06 > De : "François Schiettecatte" <fschietteca...@gmail.com> > A : solr-user@lucene.apache.org > Copie à : > Objet : Re: Wanted: a directory of quick-and-(not too)dirty analyzers for > multi-language RDF. > > > François > > I think there is a language identification tool in the Nutch code base, > otherwise I have written one in Perl which could easily be translated to > Java. I wont have access to it for 10 days (I am traveling), but I am happy > to send you a link to it when I get back (and anyone else who wants it). > > Cheers > > François > > On Mar 25, 2011, at 11:59 AM, Grant Ingersoll wrote: > > > You are looking for a language identification tool. You could check > > https://issues.apache.org/jira/browse/SOLR-1979 for the start of this. > > Otherwise, you have to roll your own or buy a third party one. > > > > On Mar 24, 2011, at 12:24 PM, fr.jur...@voila.fr wrote: > > > >> Hello Solrists, > >> > >> As it says in the subject line, I'm looking for a Java component that, > >> given an ISO 639-1 code or some equivalent, > >> would return a Lucene Analyzer ready to gobble documents in the > >> corresponding language. > >> Solr looks like it has to contain one, > >> only I've not been able to locate it so far; > >> can you point the spot? > >> > >> I've found org.apache.solr.analysis, > >> and thing like org.apache.lucene.analysis.bg &c in lucene/modules, > >> with many classes which I'm sure are related, however the factory itself > >> still eludes me; > >> I mean the Java class.method that'd decide on request, what to do with all > >> these packages > >> to bring the requisite object to existence, once the language is specified. > >> Where should I look? Or was I mistaken & Solr has nothing of the kind, at > >> least in Java? > >> Thanks in advance for your help. > >> > >> Best regards, > >> François Jurain. > >> > >> ____________________________________________________ > >> > >> Retrouvez les 10 conseils pour économiser votre carburant sur Voila : > >> http://actu.voila.fr/evenementiel/LeDossierEcologie/l-eco-conduite/ > >> > >> > >> > > > > -------------------------- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem docs using Solr/Lucene: > > http://www.lucidimagination.com/search > > > > > ____________________________________________________ Suivez toute l'actualité en photos de l'émission Carré Viiip et retrouvez les derniers échanges des viiip sur : http://people.voila.fr/evenementiel/carre-viiip