Re: Is Using NER The Right Approach?

Jörn Kottmann Mon, 15 Apr 2013 00:55:55 -0700

On 04/15/2013 02:31 AM, Richard Head Jr. wrote:

I have a bunch of sentences like the following:


Guacamole Dip: 5 Hass Avocados, Jalapeno Puree with Salt and BHT (preservative).

They are standalone, i.e., they are not contained within a larger 
paragraph/document structure.

I want to tag various words, creating the following:

Guacamole Dip: 5 Hass <START:term>Avocados<END>, <START:term>Jalapeno<END> Puree with 
<START:term>Salt<END> and <START:term>BHT<END> (preservative).

Looking through the mailing list for guidance, I came across this:

http://mail-archives.apache.org/mod_mbox/opennlp-users/201205.mbox/%3C4FA1EE7E.2080608%40gmail.com%3E

Which made me think that, before going though a 100 or so documents and tagging 
the words to create training data, I should get some clarification on the 
following:

1. Is NER the right tool for this?
2. My training data is somewhat small (~100 sentences) will this stymie my goal 
above?
3. Were the poor results the gentleman had with Italian addresses in part do to 
a bug mentioned here:
http://mail-archives.apache.org/mod_mbox/opennlp-users/201205.mbox/%3C4FA1EF10.2020904%40gmail.com%3E
4. Is it possible to use a text file containing only terms, or a tab delimited 
file like the ones the Stanford NER uses?

Yes, the NER should be capable of detecting the terms, but you couldalso try to use a dictionary.

Your training data is too small, especially when you train with a cutoffof 5 and the maxent model,the perceptron will work better. Label more data until you have a fewthousand sentences.

The mentioned bug was fixed in 1.5.3, but it only occurred in multi typemodels.You need complete sentences to train the NER model, just using the termsdoes not work, no we do not support the Stanford format.


Jörn

Re: Is Using NER The Right Approach?

Reply via email to