Do we know if there's live interest in using the coref module -- which seems like abandonware? (I've asked this before but I still don't have a sense of the level of interest).
jds On Tue, Oct 1, 2013 at 8:06 PM, Mark G <[email protected]> wrote: > I've been using OpenNLP for a few years and I find the best results occur > when the models are generated using samples of the data they will be run > against, one of the reasons I like the Maxent approach. I am not sure > attempting to provide models will bear much fruit other than users will no > longer be afraid of the licensing issues associated with using them in > commercial systems. I do strongly think we should provide a modelbuilding > framework (that calls the training api) and a default impl. > Coincidentally....I have been building a framework and impl over the last > few months that creates models based on seeding an iterative process with > known entities and iterating through a set of supplied sentences to > recursively create annotations, write them, create a maxentmodel, load the > model, create more annotations based on the results (there is a validation > object involved), and so on.... With this method I was able to create an > NER model for people's names against a 200K sentence corpus that returns > acceptable results just by starting with a list of five highly unambiguous > names. I will propose the framework in more detail in the coming days and > supply my impl if everyone is interested. > As for the initial question, I would like to see OpenNLP provide a > framework for rapidly/semi-automatically building models out of user data, > and also performing entity resolution across documents, in order to assign > a probability to whether the "Bob" in one document is the same as "Bob" in > another. > MG > > > On Tue, Oct 1, 2013 at 11:01 AM, Michael Schmitz > <[email protected]>wrote: > > > Hi, I've used OpenNLP for a few years--in particular the chunker, POS > > tagger, and tokenizer. We're grateful for a high performance library > > with an Apache license, but one of our greatest complaints is the > > quality of the models. Yes--we're aware we can train our own--but > > most people are looking for something that is good enough out of the > > box (we aim for this with out products). I'm not surprised that > > volunteer engineers don't want to spend their time annotating data ;-) > > > > I'm curious what other people see as the biggest shortcomings for Open > > NLP or the most important next steps for OpenNlp. I may have an > > opportunity to contribute to the project and I'm trying to figure out > > where the community thinks the biggest impact could be made. > > > > Peace. > > Michael Schmitz > > >
