Re: Understanding the OpenNLP lemmatizer

Rodrigo Agerri Wed, 25 Jan 2017 14:16:41 -0800

Hi Richard,

Thanks for the comments. For future references, note that even if the
Javadoc is not that clear (we will change that), the manual does provide
explicit code on how to use the LemmatizerME API:


http://opennlp.apache.org/documentation/1.7.1/manual/opennlp.html#tools.lemmatizer.tagging.api

The lemmatize() method in LemmatizerME is also used for evaluation, for
which no decoding is required. In any case, some changes are pending in the
Lemmatizer API, so we will try to address this issue to make it easier.

Cheers,

Rodrigo

On Wed, Jan 25, 2017 at 10:59 PM, Richard Eckart de Castilho <r...@apache.org
> wrote:

> On 25.01.2017, at 18:38, Rodrigo Agerri <rage...@apache.org> wrote:
> >
> > When I run the OpenNLP model just trained these two words get lemmatized
> as
> > expected (constituent and dependency) I guess something went wrong in
> your
> > training process.
>
> I figured out what went wrong. I implemented my own LemmaSampleStream and
> inside that, I didn't call getShortestEditScript(). I also didn't decode
> the output explicitly.
>
> The OpenNLP Lemmatizer API doesn't really indicate that these extra steps
> are necessary. I remembered to have done them when wrapping the original
> IXA implementation, but given the API design in OpenNLP, it had appeared
> to me that this was no longer necessary with the OpenNLP implementation.
>
> The Lemmatizer interface only has a lemmatize() method - the decode()
> method
> is only available in LemmatizerME. Also the LemmaSample JavaDoc doesn't
> indicate at all that the lemmas need to be encoded.
>
> IMHO it would be much less confusing to the use if the LemmatizerME.train()
> would internally do the encoding and if the lemmatize() method would
> internally do the decoding.
>
> Anyway, the accuracy is now much better. Thanks for the tip!
>
> -- Richard
>

Re: Understanding the OpenNLP lemmatizer

Reply via email to