aaa ok i see what you mean...but then again if it recognised it as a
mere token it would not throw "IncompatibleFormat" exceptions but rather
skip it as a token that is not of interest wouldn't it? I don't have any
patches to send you, i just think that not including spaces in the sgml
tag is a more wise approach...Unless of course you're extracting the
sgml tags via regex...The truth is i've not looked at the source but i
would expect you to use some sort of xml-ish means to extract the sgml
tags. If your parser is using regex then i'm sure you have your reasons
for including the spaces. But anyway, this is a very small problem for
me cos i can indeed sort it manually...My big problem still remains!!!
Anyway I'll stop bugging you...the fact that you tried to help means a
lot and certainly if i sort everything out i'll post what the problem
was for future users...
Cheers,
Jim
On 08/02/12 16:41, Joern Kottmann wrote:
The parsing code for the format expects white space tokenized text. The
<START> and<END> tags are handled different and are not
a token in this sense, but when you directly attach it to a word like you
did. acid<START> then our parsing code just recognize it as a token
and not the tag to mark entity boundaries.