Re: Debugging NameFinder

Olivier Grisel Sun, 04 Dec 2011 02:59:04 -0800

2011/12/4 Erel Segal <[email protected]>:
> Hello,
>
> I am trying to use NameFinder for a project of detecting organization names
> in Wikipedia. I use the *en-ner-organization.bin* model that I downloaded
> from the site, and I get strange results. Can you please help me understand
> these results, and what I should do to correct them?
>
> I ran NameFinder on two sentences from this page:
> http://en.wikipedia.org/wiki/Air_France
>
> The first sentence was: "*For its Première cabin , Air France 's first
> class menu is designed by Guy Martin , chef of Le Grand Vefour , a Michelin
> three-star restaurant in Paris .*" For this sentence, the name finder
> correctly tagged "*Air France*" as an organization.
>
> However, the second sentence was "*The following day , Air France was
> further instructed to share African routes with Air Afrique and UAT .*".
> For this, the name finder tagged only "*Air*" as an organization.
>
> This seems strange as the two contexts seem similar. How can you explain
> this?


In the first example the "'s" possessive marker must be a strong clue
that the token right before it must be some kind of named entity
(hence organization in that case) whereas this clue is missing in the
second example. But as OpenNLP NER models are statistical models with
many dimensions (features) it's hard to pick any single reason to
explain individual failures.

Anyway the only way to fix this, it to retrain the model on a larger
annotated dataset than the one used initially to build the default
model or to come up with better features (the only way to tell is by
evaluating them on an annotated corpus). Both task require significant
investments unfortunately.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Debugging NameFinder

Reply via email to