Re: Stemmer

Jörn Kottmann Thu, 18 Aug 2011 03:53:12 -0700

On 8/18/11 12:38 PM, Olivier Grisel wrote:

True but working on a generic API adapter would make it possible to
benefit from the huge set of existing tokenizers / analyzers from the
Lucene community. Although I am aware that most of the time lucene
analyzers drop the punctuation information which is mostly useless for
Information Retrieval but often critical for NLP.


As far as I know is Lucene redistributing the snowball stemmers,
that would could also be an option for us, then we directly have
stemmers for all languages we currently support.

I do not really see a benefit for adapting Lucene analyzers,
if someone wants to use a Lcuene tokenizer instead of an OpenNLP
one he can simply do that, and then provide the
tokenized text to OpenNLP. That is already supported.

Jörn

Re: Stemmer

Reply via email to