Re: new line as a end-of-sentence character

William Colen Wed, 27 Sep 2017 08:44:40 -0700

Sentence detector will have a bad time learning from samples without EOS
(end of sentence) mark. This is common in headlines of articles, for
example.
I usually remove from the training/evaluating corpus sentences with no
clear EOS.
During runtime, I apply some code to split sentences in new lines if I can
clear identify it as a complete headline.



Regards
William

2017-09-27 11:10 GMT-03:00 Gary Underwood <gunderw...@clinacuity.com>:

> The sentences for training are in the format of 1 per line so it should be
> fine as it is (unless you have sentences that also span lines).
>
> Gary Underwood
> gunderw...@clinacuity.com
>
>
>
> > On Sep 27, 2017, at 6:49 AM, Markus Kreuzthaler <
> markus.kreuztha...@gmail.com> wrote:
> >
> > Hello!
> >
> > How do I have to prepare the training data for sentence detection when I
> > have cases where sentences end just via a new line char, without e.g. a
> > period character / full stop at the end of the training sentence.
> >
> > Is there some special encoding for this case?
> >
> > Thank you for you help!
> >
> > lg Markus
>
>

Re: new line as a end-of-sentence character

Reply via email to