Hello Patricio The language identification is delegated to Tika since 1.4 ( https://issues.apache.org/jira/browse/NUTCH-1075) so you should create your own models with Tika instead. As for the second part of your question this is more of a SOLR issue, you'd get more help on the SOLR list instead
Best Julien On 8 October 2012 02:11, Patricio Galeas <[email protected]> wrote: > Hi, > two years ago with (Nutch 1.0), I used the following command to create a > new language profile: > *nutch plugin language-identifier > org.apache.nutch.analysis.lang.NGramProfile -create <profile-name> > <filename> <encoding>* > Now, I trying to do the same with Nutch 1.5 but * > org.apache.nutch.analysis.lang.NGramProfile* does not exist. > I tried with the language-identifier and language-detector plugins but the > performance ist not good enough for the language that I need to identify. > > I also tried the language detection in Solr. Following the hints from > http://wiki.apache.org/solr/LanguageDetection > with the following configuration: > > * <updateRequestProcessorChain name="langid">* > * <processor > > class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory"> > * > * <bool name="langid">true</bool> * > * <str name="langid.fl">content,title</str>* > * <str name="langid.whitelist">sq</str>* > * <str name="langid.langField">lang</str>* > * <str name="langid.fallback">en</str>* > * </processor>* > * <processor class="solr.LogUpdateProcessorFactory" />* > * <processor class="solr.RunUpdateProcessorFactory" />* > * </updateRequestProcessorChain>* > > But, after the indexing the field "lang" was always empty. > > ¿What I'm doing wrong? > > Any help would be appreciated > > Thanks > Pat > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

