Any chance you remember whether you tokenized the sentences *and pos-tagged the tokens* before feeding them to the maxent NER model? I' m asking because the docs say you *ONLY* need to tokenize sentences before sending them over to the trained model. However, i just stumbled upon this website: http://tech.knime.org/named-entity-recognizer-and-tag-cloud-example

which states:

" After pos tagging, the names entities can be tagged. The "OpenNLP NE tagger" node uses an OpenNLP NER model to tag the data. It is suggested to apply the NE tagger nodes after the pos tagger, in order to keep the named entities consisting of multi-words. "

According to this i must pos-tag the tokens and NOT SIMPLY tokenize them if i want to keep multi-word entities as one!!! Could this be the case? Can you remember the details from your case?

Regards,
Jim



On 08/02/12 11:44, Aliaksandr Autayeu wrote:
Yes, we had multiword entities. Actually, the dataset was quite "dirty" and
"funny" - there were names like "al`XXX" and "al XXX" and some other where
the separator was some funny unicode character. But I don't remember any
problems similar to those you have (I followed the thread). But that was
OpenNLP 1.4.0 or 1.4.3, somewhere in that range. I don't have exact figures
now, but I've fished out a precision (for one class) from an old
email: 80.98%

Aliaksandr

On Wed, Feb 8, 2012 at 11:45 AM, Jim - FooBar();<jimpil1...@gmail.com>wrote:

Hi there Autayeu,

Did you have any multi-word entities in your annotated corpus?
If yes, how did the maxent NER model perform? Could it find them or was it
just finding single-word entities?
If you don't understand why i'm asking have a  look at the previous
messages....

I really appreciate the help...

Regards,
Jim



On 08/02/12 10:39, Aliaksandr Autayeu wrote:

p.s: have you ever done any serious NER (not for demonstration purposes)
using openNLP?

I did experiments (more than a year ago, with 1.4.3) for standard three
classes, got the state of the art for our private corpus, but then we
changed approach.

Aliaksandr



Reply via email to