Maybe you do not need to use NLP for your task. Recipe websites often
render all recipes using similar html structures, it can be simpler to
just create a program for each website that will extract the recipe
title from the html DOM.
I do not know which websites you want to extract recipes from, but if
they use the hRecipe micro-format[1], the same extraction code will do
in all places.
Hth,
Alexandre
[1] http://microformats.org/wiki/hrecipe
On 11-06-20 11:31 AM, Amal Elmah wrote:
thanks for replying
What I need to do is to make a new model that can extracts the names of recipes
in specific website for cooking
could you please correct me if I made any wrong :
- first, I made a training file (training.txt) in this file I chose a lot of
sentences that contain recipe name. I put each sentence in one line for example
<START>Shortbread<END> is an easy buttery biscuits as homemade Christmas
presents .
... etc
- then I use the command line training tool to generate the new model
- After that I will use this model in my application to deal with any new page
from this cooking website.
- the features will be extracted automatically by Opennlp so I do not need to
specify that just I nedd to provide as many training data as I can (this is
what I understood)
Are all my steps right?
Do I need to do anything to make the results more accurate?
I appreciate your help
Best,
Amal
From: olivier.gri...@ensta.org
Date: Mon, 20 Jun 2011 10:06:23 +0200
Subject: Re: OpenNLP tool for NameFinder
To: opennlp-users@incubator.apache.org
2011/6/20 Amal Elmah<amalalthougha...@hotmail.com>:
Hi OpenNLP team,
I used the command line training tool for NameFinder .So, I used the following
command:
$bin/opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data
en-ner-person.train -model en-ner-person.bin
I do not know from where can I get the en-ner-person.train . So, I made a
trining file (training.txt) and add training data as follows:
<START:person> Pierre Vinken<END> , 61 years old , will join the board as a
nonexecutive director Nov. 29 .
Mr .<START:person> Vinken<END> is chairman of Elsevier N.V. , the Dutch
publishing group .
My Questions are:
1- How can I add features if I want to use the command line training tool not
API? Can you please give me an example if this is possible!
AFAIK in the current state feature extraction is only customizable
through the API.
2- Can we add features to the training data I mean with the annotation<START:
person feature=value>
No. What would be the use case? Can you give a concrete example of
such a manual feature annotation? What goal do you want to achieve
with such annotations?
3- Does Opennlp tool have a way to generate these features automatically from
the training data?
OpenNLP already generates its feature automatically by combining
several feature extractors as in:
https://svn.apache.org/repos/asf/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/DefaultNameContextGenerator.java
All those feature extractors do not expect any kind of many
annotations. This is expected since in general the text you want to
analyze with a NameFinde instance will not have any kind of
annotations.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel