+1 to make this easier. I also had a look at the interface and it should be possible to use LemmatizerME just via the Lemmatizer interface.
Would it be possible to do the encoding/decoding internally like Richard suggests? Jörn On Wed, Jan 25, 2017 at 10:59 PM, Richard Eckart de Castilho <r...@apache.org > wrote: > On 25.01.2017, at 18:38, Rodrigo Agerri <rage...@apache.org> wrote: > > > > When I run the OpenNLP model just trained these two words get lemmatized > as > > expected (constituent and dependency) I guess something went wrong in > your > > training process. > > I figured out what went wrong. I implemented my own LemmaSampleStream and > inside that, I didn't call getShortestEditScript(). I also didn't decode > the output explicitly. > > The OpenNLP Lemmatizer API doesn't really indicate that these extra steps > are necessary. I remembered to have done them when wrapping the original > IXA implementation, but given the API design in OpenNLP, it had appeared > to me that this was no longer necessary with the OpenNLP implementation. > > The Lemmatizer interface only has a lemmatize() method - the decode() > method > is only available in LemmatizerME. Also the LemmaSample JavaDoc doesn't > indicate at all that the lemmas need to be encoded. > > IMHO it would be much less confusing to the use if the LemmatizerME.train() > would internally do the encoding and if the lemmatize() method would > internally do the decoding. > > Anyway, the accuracy is now much better. Thanks for the tip! > > -- Richard >