Re: Problem with openNLP Name Finder API....

Jim - FooBar(); Wed, 08 Feb 2012 07:45:50 -0800

That is how we defined the training format. The<START>  and<END>  tag MUST
be white space separated,
otherwise it will be recognized as a token.

I think you meant it *WON'T* be recognised as a token...a mere typo orhave i misunderstood?I mean its pretty obvious that spaces are necessary from all theexceptions thrown while counting the events! I'm just wondering why haveyou chosen to do that in the first place?No spaces makes a lot moresense to me regardless of the problem i am having with regex replacement...

Have a look at our documentation. The NER code you see there is correct.
If you have problems to detect multi-token names I suspect that something
with your training data is wrong.

I've spent the last 2 weeks reading the docs and i 've practically readall the external sources on stackoverflow and otehr sources. Howevereveryone's demo is about the same thing shown in the docs which is theperson name finder (which happens to include a multi-word token - PierreVinken)! As far as the training data is concerned i've systematicallychecked it!!! I had to because as i said in the beginning i had nestedtags which had to be sorted manually...I spent a whole day doing thatbut at least i was thinking "Finally i am so close to training...!!!".On top of that if there was something wrong with my training data iwould expect exceptions again but i 'm not getting any since i sortedout the nested tags !!!

The Name Finder takes a tokenized sentence at a time. After you are done
with a document
you should clear the adaptive data.

In order to avoid doing that i have merged all 383 papers into a singleone with "cat *.txt -> merged.txt" and i'm treating it as a singledocument...

Is that a problem? I don't see how it could be...

Regards,
Jim




On 08/02/12 15:29, Joern Kottmann wrote:

On Mon, Feb 6, 2012 at 6:15 PM, Jim - FooBar();<jimpil1...@gmail.com>wrote:

Now the easy and straightforward solution to that is to NOT add spaces in
the tag like this :

<START:drug>Folic acid<END>    (this will NOT match "acid" in later parsing)

I honestly wasn't expecting that to make any difference to the training
process but as it turns out it breaks it completely.



That is how we defined the training format. The<START>  and<END>  tag MUST
be white space separated,
otherwise it will be recognized as a token.

Jörn

Re: Problem with openNLP Name Finder API....

Reply via email to