It puts spaces within the tag - are you 100% positive it puts spaces outside the tag too? Generally this doesn't matter as tokens are already space-tokenised but surely you can imagine a case where the text was "...said to Mr. John." and the annotation would normally be "... said to Mr. <START:person> John <END>.". If your scripts forces spaces before and after the tag, in the majority of cases you would end up with double spaces everywhere. My annotator for opennlp does not enforce spaces outside the tag and assumes the user can sort these few weird cases with an editor which supports regex.

Jim

On 20/11/13 17:03, Walrus theCat wrote:
Hi Jim,

Thanks for your interest.  I realize that's how most other people solved
this error message, but it's not applicable in my case.  The code errors
out on the first document, which doesn't commit this formatting error, and
it's not possible for any of my text to be formatted like that because the
script that generates it puts in spaces.  To be thorough, I did search the
docs and nothing comes up.  Does anyone have any ideas what could be wrong
here?

Thanks


On Wed, Nov 20, 2013 at 2:38 AM, Jim - FooBar(); <[email protected]>wrote:

On 20/11/13 07:23, Walrus theCat wrote:

In training my NameFinderME, I get the following error message:

Computing event counts... java.io.IOException: Found unexpected
annotation:

In everything else Google has found me for this error message, it's always
a simple error in the spacing of the training data (e.g., change
<START:entity>some
text<END> to <START:entity> some text <END> . This isn't applicable to me
(it's all correctly spaced.) It's all UTF-16, and specified to be so when
I
set up the objects to do the training. Any ideas on what could be wrong?

Thank you


press ctrl+f on your favourite editor and search 'n' replace ">." with ">
." and possibly ">," with "> ,". I've been bitten by this before :)

hope that helps,
Jim


Reply via email to