Hi William!

I found this issue which was obviously fixed:
https://issues.apache.org/jira/browse/OPENNLP-602

So when I have a sentence like:

The quick brown
fox jumps over the lazy dog

I will encode my training sentence in one line as:

The quick brown fox <LF> jumps over the lazy dog <LF>

Eventhough I am not sure if I can avoid the line space after dog so
swiching to

The quick brown fox <LF> jumps over the lazy dog<LF>

I will give it a try, or maybe someone can give me a hint which version is
correct...

Thank you!

lg Markus


2017-09-27 17:44 GMT+02:00 William Colen <william.co...@gmail.com>:

> Sentence detector will have a bad time learning from samples without EOS
> (end of sentence) mark. This is common in headlines of articles, for
> example.
> I usually remove from the training/evaluating corpus sentences with no
> clear EOS.
> During runtime, I apply some code to split sentences in new lines if I can
> clear identify it as a complete headline.
>
>
> Regards
> William
>
> 2017-09-27 11:10 GMT-03:00 Gary Underwood <gunderw...@clinacuity.com>:
>
> > The sentences for training are in the format of 1 per line so it should
> be
> > fine as it is (unless you have sentences that also span lines).
> >
> > Gary Underwood
> > gunderw...@clinacuity.com
> >
> >
> >
> > > On Sep 27, 2017, at 6:49 AM, Markus Kreuzthaler <
> > markus.kreuztha...@gmail.com> wrote:
> > >
> > > Hello!
> > >
> > > How do I have to prepare the training data for sentence detection when
> I
> > > have cases where sentences end just via a new line char, without e.g. a
> > > period character / full stop at the end of the training sentence.
> > >
> > > Is there some special encoding for this case?
> > >
> > > Thank you for you help!
> > >
> > > lg Markus
> >
> >
>

Reply via email to