[Nutch-dev] Re: lang identifier and nutch analyzer in trunk

Jérôme Charron Mon, 23 Jan 2006 05:19:05 -0800

> +1. Other local modifications which I use frequently:
>
> * exporting a list of supported languages,
>
> * exporting an NGramProfile of the analyzed text,
>
> * allow processing of chunks of input (i.e.
> LanguageIdentifier.identify(char[] buf, int start, int len) ) - this is
> very useful if the text to be analyzed is already present in memory, and
> the choice of sections (chunks) is made elsewhere, e.g. for documents
> with clearly outlined sections, or for multi-language documents.


Thanks for these intereseting comments Andrzej => I add them to my todo
list.

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

[Nutch-dev] Re: lang identifier and nutch analyzer in trunk

Reply via email to