It puts spaces within the tag - are you 100% positive it puts spaces
outside the tag too? Generally this doesn't matter as tokens are already
space-tokenised but surely you can imagine a case where the text was
"...said to Mr. John." and the annotation would normally be "... said to
Mr. <START:person> John <END>.". If your scripts forces spaces before
and after the tag, in the majority of cases you would end up with double
spaces everywhere. My annotator for opennlp does not enforce spaces
outside the tag and assumes the user can sort these few weird cases with
an editor which supports regex.
Jim
On 20/11/13 17:03, Walrus theCat wrote:
Hi Jim,
Thanks for your interest. I realize that's how most other people solved
this error message, but it's not applicable in my case. The code errors
out on the first document, which doesn't commit this formatting error, and
it's not possible for any of my text to be formatted like that because the
script that generates it puts in spaces. To be thorough, I did search the
docs and nothing comes up. Does anyone have any ideas what could be wrong
here?
Thanks
On Wed, Nov 20, 2013 at 2:38 AM, Jim - FooBar(); <[email protected]>wrote:
On 20/11/13 07:23, Walrus theCat wrote:
In training my NameFinderME, I get the following error message:
Computing event counts... java.io.IOException: Found unexpected
annotation:
In everything else Google has found me for this error message, it's always
a simple error in the spacing of the training data (e.g., change
<START:entity>some
text<END> to <START:entity> some text <END> . This isn't applicable to me
(it's all correctly spaced.) It's all UTF-16, and specified to be so when
I
set up the objects to do the training. Any ideas on what could be wrong?
Thank you
press ctrl+f on your favourite editor and search 'n' replace ">." with ">
." and possibly ">," with "> ,". I've been bitten by this before :)
hope that helps,
Jim