On Wed, Feb 8, 2012 at 5:56 PM, Jim - FooBar(); <jimpil1...@gmail.com>wrote:
> aaa ok i see what you mean...but then again if it recognised it as a mere > token it would not throw "IncompatibleFormat" exceptions but rather skip it > as a token that is not of interest wouldn't it? I don't have any patches to > send you, i just think that not including spaces in the sgml tag is a more > wise approach...Unless of course you're extracting the sgml tags via > regex...The truth is i've not looked at the source but i would expect you > to use some sort of xml-ish means to extract the sgml tags. If your parser > is using regex then i'm sure you have your reasons for including the > spaces. But anyway, this is a very small problem for me cos i can indeed > sort it manually...My big problem still remains!!! > The code splits the input string by line and then by white space. Then the individual parts either match our start and end tags or not. > Anyway I'll stop bugging you...the fact that you tried to help means a lot > and certainly if i sort everything out i'll post what the problem was for > future users... > > We are also interested why it does not work for you, we usually use this kind of experience to improve OpenNLP. Would it be possible for you to show us a sample of your training data? Maybe one paper. Jörn