Can anyone help here? I don't want to start creating a large training file and find out I have gone about it in the wrong way.
The resources I have been looking at are https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training http://blog.thedigitalgroup.com/sagarg/2015/10/30/open-nlp-name-finder-model-training/ http://nishutayaltech.blogspot.co.uk/2015/07/writing-custom-namefinder-model-in.html None of which gives the answers I am looking for. Thanks, Robert > From: rplo...@hotmail.co.uk > To: users@opennlp.apache.org > Subject: RE: Name finder questions > Date: Wed, 20 Apr 2016 09:51:25 +0100 > > I have a few questions regarding creating my own training data for the name > finder. I would like to distinguish between people, organizations and > locations. The example in the documentation shows the tags to use for people > ie > > <START:person> Pierre Vinken <END> , 61 years old , will join the board as a > nonexecutive director Nov. 29 .So would I used <START:organization><END> and > <START:location><END> for organizations and locations respectively? The name > entity guidelines in the documentation ie > > https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.annotation_guides > > seem to show different tags getting used which has confused me slightly as to > which tags I should actually use? > > Also I see the 15,000 line recommendation is there any performance hit if you > use many more lines? > > If I create my plain text training file as I outlined above is there any > other params that are recommended to use beyond the basic ie > > opennlp TokenNameFinderTrainer -model OUTPUT_FILE.bin -lang en -data > TRAINING_FILE.train -encoding UTF-8 > > For instance what is the -params training parameters file used for? Is this > necessary should this list the named entities I am looking for ie person, > organization and location if so what format should it be in? > > Sorry for the basic questions here but kind find the answers in the > documentation or from a quick google. > > Thanks, > > Robert > > > > From: rodrigo.age...@ehu.eus > > Date: Mon, 18 Apr 2016 09:36:24 +0200 > > Subject: Re: Name finder questions > > To: users@opennlp.apache.org > > > > Hello, > > > > Yes, that is the idea. > > > > R > > > > On Sun, Apr 17, 2016 at 9:10 PM, Robert Logue <rplo...@hotmail.co.uk> wrote: > > > I am slightly confused what I can use the data in those links for? So can > > > I use this data with the training tool like the following > > > > > > opennlp TokenNameFinderTrainer -model OUTPUT_FILE_NAME -lang en > > > -data DOWNLOADED_FILE_NAME -encoding UTF-8 > > > And that should give me a better model file for when I use the name > > > finder? > > > > > > Thanks, > > > > > > Robert > > > > > >> From: rodrigo.age...@ehu.eus > > >> Date: Fri, 15 Apr 2016 17:12:20 +0200 > > >> Subject: Re: Name finder questions > > >> To: users@opennlp.apache.org > > >> > > >> Hi Robert, > > >> > > >> On Fri, Apr 15, 2016 at 10:25 AM, Robert Logue <rplo...@hotmail.co.uk> > > >> wrote: > > >> > Hello, > > >> > > > >> > I have just started using OpenNLP in the java application. I am just > > >> > getting my used with the software and have a couple of newbie > > >> > questions. > > >> > > > >> > I see for the name finder there is different model data for people and > > >> > organizations (en-ner-organization.bin and en-ner-person.bin). Is > > >> > there any way to combine these into one file so I can do 1 search that > > >> > will give me back person names and organization names. Or is this not > > >> > possible and is it best to do two searches? > > >> > > >> This used to be experimental. It is not anymore, namely, you can train > > >> a name finder model for more than one entity type. The models > > >> available were trained with rather old newswire data so I would > > >> recommend you to obtain train new models using OpenNLP: > > >> > > >> http://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.tool > > >> > > >> I suppose you do not have manually annotated training data so I could > > >> recommend to get the Ontonotes corpus. > > >> > > >> https://catalog.ldc.upenn.edu/LDC2013T19 > > >> > > >> https://github.com/ontonotes/conll-formatted-ontonotes-5.0 > > >> > > >> Another option is to get a silver standard corpus obtained > > >> automatically from the Wikipedia: > > >> > > >> http://schwa.org/projects/resources/wiki/Wikiner#Automatic-training-data-from-Wikipedia > > >> > > >> For Dutch, Spanish, German and Italian (that I know of) there are free > > >> resources. Search for Ancora, SONAR-1, GermEval 2014 and Evalita 2009. > > >> > > >> > This question isn't related to the name finder and I don't think it is > > >> > possible but thought I would ask anyway. If I had two sentences say > > >> > 'Jack climbed the hill. He was very tired.' Is there any way to know > > >> > that the pronoun, he, at the start of the second sentence is actually > > >> > about Jack the subject of the first sentence? I know in this simple > > >> > case it is obvious but I am wondering if there is anything in the > > >> > OpenNLP software that will help with this? > > >> > > >> The example you mentioned is called "pronominal anaphora" and it > > >> generalizes in the coreference resolution problem. There used to be a > > >> coreference tool in OpenNLP but got moved to the Sandbox because many > > >> things need to be updated to be able to distribute it. > > >> > > >> See http://conll.cemantix.org/2012/introduction.html for more details. > > >> > > >> HTH, > > >> > > >> R > > > >