On Wed, Feb 8, 2012 at 5:56 PM, Jim - FooBar(); <jimpil1...@gmail.com>wrote:

> aaa ok i see what you mean...but then again if it recognised it as a mere
> token it would not throw "IncompatibleFormat" exceptions but rather skip it
> as a token that is not of interest wouldn't it? I don't have any patches to
> send you, i just think that not including spaces in the sgml tag is a more
> wise approach...Unless of course you're extracting the sgml tags via
> regex...The truth is i've not looked at the source but i would expect you
> to use some sort of xml-ish means to extract the sgml tags. If your parser
> is using regex then i'm sure you have your reasons for including the
> spaces. But anyway, this is a very small problem for me cos i can indeed
> sort it manually...My big problem still remains!!!
>

The code splits the input string by line and then by white space. Then the
individual parts either
match our start and end tags or not.



> Anyway I'll stop bugging you...the fact that you tried to help means a lot
> and certainly if i sort everything out i'll post what the problem was for
> future users...
>
>
We are also interested why it does not work for you, we usually use this
kind of experience to
improve OpenNLP.

Would it be possible for you to show us a sample of your training data?
Maybe one paper.

Jörn

Reply via email to