Hello! I state my problem again as I think it is quite similar to the following issue: https://issues.apache.org/jira/browse/OPENNLP-602
I work with clinical narratives so eos characters are very often just missing, and I try to train a new robust sentence model. >From the issue above it is suggested to encode these types of endings with <CR><LF> or just a <LF> How do I set this up properly? char[] eosCharacters = {'!','?','.'}; SentenceDetectorFactory sentenceFactory = new SentenceDetectorFactory("de", true, null ,eosCharacters); eosCharacters is a char array, how to put in your suggested encodings '<CR><LF>', '<LF>'? How do I have to prepare my final training data set then? So I have for example in the text something like (with an artificial line break in the middle of the sentence): The quick abbr. brown fox jumps over the lazy dog Training: The quick abbr. brown fox jumps over the lazy dog <CR><LF> If the standard eos charactes {'.','?','!'} are existing: The quick abbr. brown fox jumps over the lazy dog. Training: The quick abbr. brown fox jumps over the lazy dog. If I have an abbreviation at the end of a sentence do I have to encode this in a special way? The quick abbr. brown fox jumps over the lazy dog abbr. Training: The quick abbr. brown fox jumps over the lazy dog abbr. When I have trained my model, do I have to accommodate the input text to e.g. <CR><LF> or <LF> inputs as used in the training sentences? Thank you for your help! lg Markus