Re: Training data for English sentence segmentation

Jason Baldridge Mon, 07 May 2012 14:10:32 -0700

It comes from the Penn treebank, and is not accessible to those who don't
have data. We're making a push to switch to models trained on open data,
such as the Open American National Corpus. More to come on that in the
coming weeks.


On Wed, May 2, 2012 at 11:23 AM, Juan Miguel Cejuela
<[email protected]>wrote:

> Hi,
>
> in the models list page, it's written that the EN sentence detector uses
> opennlp training data. Is it possible to access this training data? Besides
> this, which other training corpora are for EN sentence segmentation?
>
>
> Much appreciated
>
> --
> Juan Miguel Cejuela
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: Training data for English sentence segmentation

Reply via email to