Sentence detector will have a bad time learning from samples without EOS (end of sentence) mark. This is common in headlines of articles, for example. I usually remove from the training/evaluating corpus sentences with no clear EOS. During runtime, I apply some code to split sentences in new lines if I can clear identify it as a complete headline.
Regards William 2017-09-27 11:10 GMT-03:00 Gary Underwood <gunderw...@clinacuity.com>: > The sentences for training are in the format of 1 per line so it should be > fine as it is (unless you have sentences that also span lines). > > Gary Underwood > gunderw...@clinacuity.com > > > > > On Sep 27, 2017, at 6:49 AM, Markus Kreuzthaler < > markus.kreuztha...@gmail.com> wrote: > > > > Hello! > > > > How do I have to prepare the training data for sentence detection when I > > have cases where sentences end just via a new line char, without e.g. a > > period character / full stop at the end of the training sentence. > > > > Is there some special encoding for this case? > > > > Thank you for you help! > > > > lg Markus > >