When you only do it during training then it will not consider ":" as a possible split during detection. That explains your drop in accuracy.
It looks like that it is not possible to modify the EOS characters properly with the current version. I suggest that you checkout the source code and then change the defaultEosCharacters array in opennlp.tools.sentdetect.Factory. With that you are able to do your test and get it working for now. Anyway we should have an easy way to specify the EOS characters without implementing a custom Factory class. Please open a jira to improve this. Jörn On Thu, Feb 9, 2012 at 9:21 AM, Katrin Tomanek <katrin.toma...@averbis.com>wrote: > Hi Jörn, > > I only modified the training process. > > However, when I check the predictions it turns out that the model never > learns to split at ":" positions. > > Shouldn't it be enought to modify the DefaultSDContextGenerator and the > DefaultEndOfSentenceScanner so that these know about ":" as an EOS, right? > Or are there other places where ":" should be added? > > Best > Katrin > > > > On 02/09/2012 09:18 AM, Joern Kottmann wrote: > >> Did you modify the evaluation as well? If you just do it during training >> the >> evaluator will not be able to consider ":" as en EOS character. >> >> For me it sounds like that it fails to split on the ":" in some place. >> >> The sentence detector uses a maxent model to classify every EOS character >> as either a SPLIT or NO_SPLIT. >> >> Jörn >> >> On Thu, Feb 9, 2012 at 8:59 AM, Katrin Tomanek >> <katrin.toma...@averbis.com>**wrote: >> >> Hi Willian, >>> >>> I am currently using opennlp-1.5.2 and try to use it as an API, i.e. not >>> to modify this code by write my own code around it. However, what I >>> described below (with the SDEventStream) results in the same as you are >>> describing: I am changing the set of EOS characters. >>> >>> I am just wondering, why adding ":" as an EOS character decreases the >>> results (dropping von ~80F to 45F in sentence splitting, and ":" is >>> always >>> a sentence boundary symbol in my data!) >>> >>> Looks like I need to debug a little bit more whats happening in the >>> DefaultSDContextGenerator. >>> >>> >> > > -- > Dr. Katrin Tomanek > Averbis GmbH > Tennenbacher Strasse 11 > D-79106 Freiburg > > Fon: +49 (0) 761 - 203 97696 > Fax: +49 (0) 761 - 203 97694 > E-Mail: katrin.toma...@averbis.com > > Geschäftsführer: Dr. med. Philipp Daumke, Dr. Kornél Markó > Sitz der Gesellschaft: Freiburg i. Br. > AG Freiburg i. Br., HRB 701080 >