Re: Next Steps for OpenNLP

John Stewart Tue, 01 Oct 2013 19:00:40 -0700

Do we know if there's live interest in using the coref module -- which
seems like abandonware?  (I've asked this before but I still don't have a
sense of the level of interest).


jds


On Tue, Oct 1, 2013 at 8:06 PM, Mark G <[email protected]> wrote:

> I've been using OpenNLP for a few years and I find the best results occur
> when the models are generated using samples of the data they will be run
> against, one of the reasons I like the Maxent approach. I am not sure
> attempting to provide models will bear much fruit other than users will no
> longer be afraid of the licensing issues associated with using them in
> commercial systems. I do strongly think we should provide a modelbuilding
> framework (that calls the training api) and a default impl.
> Coincidentally....I have been building a framework and impl over the last
> few months that creates models based on seeding an iterative process with
> known entities and iterating through a set of supplied sentences to
> recursively create annotations, write them, create a maxentmodel, load the
> model, create more annotations based on the results (there is a validation
> object involved), and so on.... With this method I was able to create an
> NER model for people's names against a 200K sentence corpus that returns
> acceptable results just by starting with a list of five highly unambiguous
> names. I will propose the framework in more detail in the coming days and
> supply my impl if everyone is interested.
> As for the initial question, I would like to see OpenNLP provide a
> framework for rapidly/semi-automatically building models out of user data,
> and also performing entity resolution across documents, in order to assign
> a probability to whether the "Bob" in one document is the same as "Bob" in
> another.
> MG
>
>
> On Tue, Oct 1, 2013 at 11:01 AM, Michael Schmitz
> <[email protected]>wrote:
>
> > Hi, I've used OpenNLP for a few years--in particular the chunker, POS
> > tagger, and tokenizer.  We're grateful for a high performance library
> > with an Apache license, but one of our greatest complaints is the
> > quality of the models.  Yes--we're aware we can train our own--but
> > most people are looking for something that is good enough out of the
> > box (we aim for this with out products).  I'm not surprised that
> > volunteer engineers don't want to spend their time annotating data ;-)
> >
> > I'm curious what other people see as the biggest shortcomings for Open
> > NLP or the most important next steps for OpenNlp.  I may have an
> > opportunity to contribute to the project and I'm trying to figure out
> > where the community thinks the biggest impact could be made.
> >
> > Peace.
> > Michael Schmitz
> >
>

Re: Next Steps for OpenNLP

Reply via email to