Re: Problem with openNLP Name Finder API....

Jim - FooBar(); Wed, 08 Feb 2012 02:16:22 -0800

Hi James,

Ok i'll have a look at the new code but to be honest i 'm not having anyproblems with the case-sensitivity flag...Also, are you referring to theflag of the Dictionary or the trained NER model or both?

Sorry to ask the same question again but how on earth is the dictionarysupposed to recognise words like "Folic acid" when all that it sees are"Folic" & "acid" ? I can understand that a properly trained model hasgenerated features in order to learn what tokens to recognise. Thisusually includes words before and after the token in question amongstother things.... But the dictionary which is purely an exhaustive,iterative search based solely on a list of words - only looks at thetoken in question so i don't think it is possible to find multi-wordentities using the DictionaryNameFinder.

Of course in my case, i 've got the same problem with both approacheswhich comes down to the tokenization that happens before the dictionaryor the maxent model take over. But at least i can understand how themaxent model can look for multi-word entities but i certainly don'tunderstand how the Dictionary can...

One of my fellow researchers suggested that as far as the maxent modelis concerned, i should look at the confidence probabilities of eachtoken in each sentence and concatenate any tokens with low enoughconfidence probabilities in order to get my multi-word tokens. Of coursethat is assuming that both "Folic" and "acid" are found by the NER andthat they have very similar confidence probabilities. But that is nothte case with me...I mean it finds "Folic" and "Domoic" but NOT "acid"which is what follows both folic and domoic...So again dead end!!!

Is there any chance you can point me to the person that wrote theofficial tutorial for openNLP or the person that actually did thetraining for the provided person name-finder model? I really want tohear his/her opinion on the matter...My problem is that the tutorialshows how to recognize person names (which are 2-3 words most of thetimes) but i strongly suspect that this is impossible without doingsomething extra which the tutorial unfortunately does not show...

By the way, the words i'm choosing to mention are not random...The wordsfolic acid and domoic acid appear at least 200 times in the trainingdata! The words "acid" appears at least 1000 times as part of otherdrug-names but it is still not recognised by the name-finder!!! Not evenas a single token like it does with folic and domoic...


Regards,
Jim

p.s: have you ever done any serious NER (not for demonstration purposes)using openNLP?





On 08/02/12 01:23, James Kosin wrote:

On 2/7/2012 5:29 AM, Jim - FooBar(); wrote:

Hey James,

First of all thanks for taking the time to reply to my massive e-mail.
I really appreciate it...

Secondly i think you slightly misunderstood my problems..to be fair
it's quite complicated!

Ok, so i'm using openNLP 1.5.2 so i don't think there is any newer
nameFinder code.

Hi again,

Actually, SVN has a later version that fixes a problem with the name
matching over long series.  A bug in the code was causing the case
sensitivity flag to go away when matching other tokens in a series.
This was due to an attempted optimization that was working before we
fixed the case sensitivity issues which broke the optimization.

James

Re: Problem with openNLP Name Finder API....

Reply via email to