+1 Yes, I will take a look at it.
R On Thu, Jan 26, 2017 at 11:08 AM, Joern Kottmann <kottm...@gmail.com> wrote: > +1 to make this easier. I also had a look at the interface and it should be > possible to use LemmatizerME just via the Lemmatizer interface. > > Would it be possible to do the encoding/decoding internally like Richard > suggests? > > Jörn > > On Wed, Jan 25, 2017 at 10:59 PM, Richard Eckart de Castilho < > r...@apache.org > > wrote: > > > On 25.01.2017, at 18:38, Rodrigo Agerri <rage...@apache.org> wrote: > > > > > > When I run the OpenNLP model just trained these two words get > lemmatized > > as > > > expected (constituent and dependency) I guess something went wrong in > > your > > > training process. > > > > I figured out what went wrong. I implemented my own LemmaSampleStream and > > inside that, I didn't call getShortestEditScript(). I also didn't decode > > the output explicitly. > > > > The OpenNLP Lemmatizer API doesn't really indicate that these extra steps > > are necessary. I remembered to have done them when wrapping the original > > IXA implementation, but given the API design in OpenNLP, it had appeared > > to me that this was no longer necessary with the OpenNLP implementation. > > > > The Lemmatizer interface only has a lemmatize() method - the decode() > > method > > is only available in LemmatizerME. Also the LemmaSample JavaDoc doesn't > > indicate at all that the lemmas need to be encoded. > > > > IMHO it would be much less confusing to the use if the > LemmatizerME.train() > > would internally do the encoding and if the lemmatize() method would > > internally do the decoding. > > > > Anyway, the accuracy is now much better. Thanks for the tip! > > > > -- Richard > > >