Re: OpenNLP tool for NameFinder

Alexandre Patry Mon, 20 Jun 2011 08:39:38 -0700

Maybe you do not need to use NLP for your task. Recipe websites oftenrender all recipes using similar html structures, it can be simpler tojust create a program for each website that will extract the recipetitle from the html DOM.

I do not know which websites you want to extract recipes from, but ifthey use the hRecipe micro-format[1], the same extraction code will doin all places.


Hth,

Alexandre

[1] http://microformats.org/wiki/hrecipe

On 11-06-20 11:31 AM, Amal Elmah wrote:

thanks for replying

What I need to do is to make a new model that can extracts the names of recipes 
in specific website for cooking
could you please correct me if I made any wrong :

- first, I made a training file (training.txt) in this file I chose a lot of 
sentences that contain recipe name. I put each sentence in one line for example
<START>Shortbread<END>  is an easy  buttery biscuits as homemade Christmas 
presents .
... etc

- then I use the command line training tool to generate the new model
- After that I will use this model in my application to deal with any new page 
from this cooking website.
- the features will be extracted automatically by Opennlp so I do not need to 
specify that just I nedd to provide as many training data as I can (this is 
what I understood)

Are all my steps right?
Do I need to do anything to make the results more accurate?
I appreciate your help

Best,
Amal

From: olivier.gri...@ensta.org
Date: Mon, 20 Jun 2011 10:06:23 +0200
Subject: Re: OpenNLP tool for NameFinder
To: opennlp-users@incubator.apache.org

2011/6/20 Amal Elmah<amalalthougha...@hotmail.com>:

Hi OpenNLP team,

I used the command line training tool for NameFinder .So, I used the following 
command:
$bin/opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data 
en-ner-person.train -model en-ner-person.bin

I do not know from where can I get the en-ner-person.train . So, I made a 
trining file (training.txt) and add training data as follows:

<START:person>  Pierre Vinken<END>  , 61 years old , will join the board as a 
nonexecutive director Nov. 29 .
Mr .<START:person>  Vinken<END>  is chairman of Elsevier N.V. , the Dutch 
publishing group .

My Questions are:
1- How can I add features if I want to use the command line training tool not 
API? Can you please give me an example if this is possible!

AFAIK in the current state feature extraction is only customizable
through the API.

2- Can we add features to the training data I mean with the annotation<START: 
person feature=value>

No. What would be the use case? Can you give a concrete example of
such a manual feature annotation? What goal do you want to achieve
with such annotations?

3- Does Opennlp tool have a way to generate these features automatically from 
the training data?

OpenNLP already generates its feature automatically by combining
several feature extractors as in:

https://svn.apache.org/repos/asf/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/DefaultNameContextGenerator.java

All those feature extractors do not expect any kind of many
annotations. This is expected since in general the text you want to
analyze with a NameFinde instance will not have any kind of
annotations.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: OpenNLP tool for NameFinder

Reply via email to