Re: Understanding the OpenNLP lemmatizer

Rodrigo Agerri Thu, 26 Jan 2017 02:29:59 -0800

+1

Yes, I will take a look at it.


R

On Thu, Jan 26, 2017 at 11:08 AM, Joern Kottmann <kottm...@gmail.com> wrote:

> +1 to make this easier. I also had a look at the interface and it should be
> possible to use LemmatizerME just via the Lemmatizer interface.
>
> Would it be possible to do the encoding/decoding internally like Richard
> suggests?
>
> Jörn
>
> On Wed, Jan 25, 2017 at 10:59 PM, Richard Eckart de Castilho <
> r...@apache.org
> > wrote:
>
> > On 25.01.2017, at 18:38, Rodrigo Agerri <rage...@apache.org> wrote:
> > >
> > > When I run the OpenNLP model just trained these two words get
> lemmatized
> > as
> > > expected (constituent and dependency) I guess something went wrong in
> > your
> > > training process.
> >
> > I figured out what went wrong. I implemented my own LemmaSampleStream and
> > inside that, I didn't call getShortestEditScript(). I also didn't decode
> > the output explicitly.
> >
> > The OpenNLP Lemmatizer API doesn't really indicate that these extra steps
> > are necessary. I remembered to have done them when wrapping the original
> > IXA implementation, but given the API design in OpenNLP, it had appeared
> > to me that this was no longer necessary with the OpenNLP implementation.
> >
> > The Lemmatizer interface only has a lemmatize() method - the decode()
> > method
> > is only available in LemmatizerME. Also the LemmaSample JavaDoc doesn't
> > indicate at all that the lemmas need to be encoded.
> >
> > IMHO it would be much less confusing to the use if the
> LemmatizerME.train()
> > would internally do the encoding and if the lemmatize() method would
> > internally do the decoding.
> >
> > Anyway, the accuracy is now much better. Thanks for the tip!
> >
> > -- Richard
> >
>

Re: Understanding the OpenNLP lemmatizer

Reply via email to