Thank you for your answer! I create the new .ngp file for the new language using tika-app-1.0.jar. A post<http://stackoverflow.com/questions/9044916/how-can-i-detect-farsi-web-pages-by-tika/9045385>in stackoverflow recommends to use the tika.language.override.properties file to configure the new language in Tika. But I can't find information to configure it in Nutch.
I very appreciate some hints to do that. Thanks Patricio 2012/10/8 Julien Nioche <[email protected]> > Hello Patricio > > The language identification is delegated to Tika since 1.4 ( > https://issues.apache.org/jira/browse/NUTCH-1075) so you should create > your > own models with Tika instead. As for the second part of your question this > is more of a SOLR issue, you'd get more help on the SOLR list instead > > Best > > Julien > > On 8 October 2012 02:11, Patricio Galeas <[email protected]> > wrote: > > > Hi, > > two years ago with (Nutch 1.0), I used the following command to create a > > new language profile: > > *nutch plugin language-identifier > > org.apache.nutch.analysis.lang.NGramProfile -create <profile-name> > > <filename> <encoding>* > > Now, I trying to do the same with Nutch 1.5 but * > > org.apache.nutch.analysis.lang.NGramProfile* does not exist. > > I tried with the language-identifier and language-detector plugins but > the > > performance ist not good enough for the language that I need to identify. > > > > I also tried the language detection in Solr. Following the hints from > > http://wiki.apache.org/solr/LanguageDetection > > with the following configuration: > > > > * <updateRequestProcessorChain name="langid">* > > * <processor > > > > > class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory"> > > * > > * <bool name="langid">true</bool> * > > * <str name="langid.fl">content,title</str>* > > * <str name="langid.whitelist">sq</str>* > > * <str name="langid.langField">lang</str>* > > * <str name="langid.fallback">en</str>* > > * </processor>* > > * <processor class="solr.LogUpdateProcessorFactory" />* > > * <processor class="solr.RunUpdateProcessorFactory" />* > > * </updateRequestProcessorChain>* > > > > But, after the indexing the field "lang" was always empty. > > > > ¿What I'm doing wrong? > > > > Any help would be appreciated > > > > Thanks > > Pat > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >

