RE: Name finder questions

Robert Logue Fri, 22 Apr 2016 01:17:45 -0700

Can anyone help here? I don't want to start creating a large training file and 
find out I have gone about it in the wrong way.


The resources I have been looking at are

https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training
http://blog.thedigitalgroup.com/sagarg/2015/10/30/open-nlp-name-finder-model-training/
http://nishutayaltech.blogspot.co.uk/2015/07/writing-custom-namefinder-model-in.html

None of which gives the answers I am looking for.

Thanks,

Robert

> From: rplo...@hotmail.co.uk
> To: users@opennlp.apache.org
> Subject: RE: Name finder questions
> Date: Wed, 20 Apr 2016 09:51:25 +0100
> 
> I have a few questions regarding creating my own training data for the name 
> finder. I would like to distinguish between people, organizations and 
> locations. The example in the documentation shows the tags to use for people 
> ie
> 
> <START:person> Pierre Vinken <END> , 61 years old , will join the board as a 
> nonexecutive director Nov. 29 .So would I used <START:organization><END> and 
> <START:location><END> for organizations and locations respectively? The name 
> entity guidelines in the documentation ie
> 
> https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.annotation_guides
> 
> seem to show different tags getting used which has confused me slightly as to 
> which tags I should actually use?
> 
> Also I see the 15,000 line recommendation is there any performance hit if you 
> use many more lines?
> 
> If I create my plain text training file as I outlined above is there any 
> other params that are recommended to use beyond the basic ie
> 
> opennlp TokenNameFinderTrainer -model OUTPUT_FILE.bin -lang en -data 
> TRAINING_FILE.train -encoding UTF-8
> 
> For instance what is the -params training parameters file used for? Is this 
> necessary should this list the named entities I am looking for ie person, 
> organization and location if so what format should it be in?
> 
> Sorry for the basic questions here but kind find the answers in the 
> documentation or from a quick google.
> 
> Thanks,
> 
> Robert
> 
> 
> > From: rodrigo.age...@ehu.eus
> > Date: Mon, 18 Apr 2016 09:36:24 +0200
> > Subject: Re: Name finder questions
> > To: users@opennlp.apache.org
> > 
> > Hello,
> > 
> > Yes, that is the idea.
> > 
> > R
> > 
> > On Sun, Apr 17, 2016 at 9:10 PM, Robert Logue <rplo...@hotmail.co.uk> wrote:
> > > I am slightly confused what I can use the data in those links for? So can 
> > > I use this data with the training tool like the following
> > >
> > > opennlp TokenNameFinderTrainer -model OUTPUT_FILE_NAME -lang en
> > > -data DOWNLOADED_FILE_NAME -encoding UTF-8
> > > And that should give me a better model file for when I use the name 
> > > finder?
> > >
> > > Thanks,
> > >
> > > Robert
> > >
> > >> From: rodrigo.age...@ehu.eus
> > >> Date: Fri, 15 Apr 2016 17:12:20 +0200
> > >> Subject: Re: Name finder questions
> > >> To: users@opennlp.apache.org
> > >>
> > >> Hi Robert,
> > >>
> > >> On Fri, Apr 15, 2016 at 10:25 AM, Robert Logue <rplo...@hotmail.co.uk> 
> > >> wrote:
> > >> > Hello,
> > >> >
> > >> > I have just started using OpenNLP in the java application. I am just 
> > >> > getting my used with the software and have a couple of newbie 
> > >> > questions.
> > >> >
> > >> > I see for the name finder there is different model data for people and 
> > >> > organizations (en-ner-organization.bin and en-ner-person.bin). Is 
> > >> > there any way to combine these into one file so I can do 1 search that 
> > >> > will give me back person names and organization names. Or is this not 
> > >> > possible and is it best to do two searches?
> > >>
> > >> This used to be experimental. It is not anymore, namely, you can train
> > >> a name finder model for more than one entity type. The models
> > >> available were trained with rather old newswire data so I would
> > >> recommend you to obtain train new models using OpenNLP:
> > >>
> > >> http://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.tool
> > >>
> > >> I suppose you do not have manually annotated training data so I could
> > >> recommend to get the Ontonotes corpus.
> > >>
> > >> https://catalog.ldc.upenn.edu/LDC2013T19
> > >>
> > >> https://github.com/ontonotes/conll-formatted-ontonotes-5.0
> > >>
> > >> Another option is to get a silver standard corpus obtained
> > >> automatically from the Wikipedia:
> > >>
> > >> http://schwa.org/projects/resources/wiki/Wikiner#Automatic-training-data-from-Wikipedia
> > >>
> > >> For Dutch, Spanish, German and Italian (that I know of) there are free
> > >> resources. Search for Ancora, SONAR-1, GermEval 2014 and Evalita 2009.
> > >>
> > >> > This question isn't related to the name finder and I don't think it is 
> > >> > possible but thought I would ask anyway. If I had two sentences say 
> > >> > 'Jack climbed the hill. He was very tired.' Is there any way to know 
> > >> > that the pronoun, he, at the start of the second sentence is actually 
> > >> > about Jack the subject of the first sentence? I know in this simple 
> > >> > case it is obvious but I am wondering if there is anything in the 
> > >> > OpenNLP software that will help with this?
> > >>
> > >> The example you mentioned is called "pronominal anaphora" and it
> > >> generalizes in the coreference resolution problem. There used to be a
> > >> coreference tool in OpenNLP but got moved to the Sandbox because many
> > >> things need to be updated to be able to distribute it.
> > >>
> > >> See http://conll.cemantix.org/2012/introduction.html for more details.
> > >>
> > >> HTH,
> > >>
> > >> R
> > >
>

RE: Name finder questions

Reply via email to