+1 to make this easier. I also had a look at the interface and it should be
possible to use LemmatizerME just via the Lemmatizer interface.

Would it be possible to do the encoding/decoding internally like Richard
suggests?

Jörn

On Wed, Jan 25, 2017 at 10:59 PM, Richard Eckart de Castilho <r...@apache.org
> wrote:

> On 25.01.2017, at 18:38, Rodrigo Agerri <rage...@apache.org> wrote:
> >
> > When I run the OpenNLP model just trained these two words get lemmatized
> as
> > expected (constituent and dependency) I guess something went wrong in
> your
> > training process.
>
> I figured out what went wrong. I implemented my own LemmaSampleStream and
> inside that, I didn't call getShortestEditScript(). I also didn't decode
> the output explicitly.
>
> The OpenNLP Lemmatizer API doesn't really indicate that these extra steps
> are necessary. I remembered to have done them when wrapping the original
> IXA implementation, but given the API design in OpenNLP, it had appeared
> to me that this was no longer necessary with the OpenNLP implementation.
>
> The Lemmatizer interface only has a lemmatize() method - the decode()
> method
> is only available in LemmatizerME. Also the LemmaSample JavaDoc doesn't
> indicate at all that the lemmas need to be encoded.
>
> IMHO it would be much less confusing to the use if the LemmatizerME.train()
> would internally do the encoding and if the lemmatize() method would
> internally do the decoding.
>
> Anyway, the accuracy is now much better. Thanks for the tip!
>
> -- Richard
>

Reply via email to