Hello Patricio

The language identification is delegated to Tika since 1.4 (
https://issues.apache.org/jira/browse/NUTCH-1075) so you should create your
own models with Tika instead. As for the second part of your question this
is more of a SOLR issue, you'd get more help on the SOLR list instead

Best

Julien

On 8 October 2012 02:11, Patricio Galeas <[email protected]> wrote:

> Hi,
> two years ago with (Nutch 1.0), I used the following command to create a
> new language profile:
> *nutch plugin language-identifier
> org.apache.nutch.analysis.lang.NGramProfile -create <profile-name>
> <filename> <encoding>*
> Now, I trying to do the same with Nutch 1.5 but *
> org.apache.nutch.analysis.lang.NGramProfile* does not exist.
> I tried with the language-identifier and language-detector plugins but the
> performance ist not good enough for the language that I need to identify.
>
> I also tried the language detection in Solr. Following the hints from
> http://wiki.apache.org/solr/LanguageDetection
> with the following configuration:
>
> *     <updateRequestProcessorChain name="langid">*
> *       <processor
>
> class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
> *
> *         <bool name="langid">true</bool>         *
> *         <str name="langid.fl">content,title</str>*
> *         <str name="langid.whitelist">sq</str>*
> *         <str name="langid.langField">lang</str>*
> *         <str name="langid.fallback">en</str>*
> *       </processor>*
> *       <processor class="solr.LogUpdateProcessorFactory" />*
> *       <processor class="solr.RunUpdateProcessorFactory" />*
> *     </updateRequestProcessorChain>*
>
> But, after the indexing the field "lang" was always empty.
>
> ¿What I'm doing wrong?
>
> Any help would be appreciated
>
> Thanks
> Pat
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to