aaa ok i see what you mean...but then again if it recognised it as a mere token it would not throw "IncompatibleFormat" exceptions but rather skip it as a token that is not of interest wouldn't it? I don't have any patches to send you, i just think that not including spaces in the sgml tag is a more wise approach...Unless of course you're extracting the sgml tags via regex...The truth is i've not looked at the source but i would expect you to use some sort of xml-ish means to extract the sgml tags. If your parser is using regex then i'm sure you have your reasons for including the spaces. But anyway, this is a very small problem for me cos i can indeed sort it manually...My big problem still remains!!!

Anyway I'll stop bugging you...the fact that you tried to help means a lot and certainly if i sort everything out i'll post what the problem was for future users...

Cheers,
Jim


On 08/02/12 16:41, Joern Kottmann wrote:
The parsing code for the format expects white space tokenized text. The
<START>  and<END>  tags are handled different and are not
a token in this sense, but when you directly attach it to a word like you
did. acid<START>  then our parsing code just recognize it as a token
and not the tag to mark entity boundaries.

Reply via email to