It comes from the Penn treebank, and is not accessible to those who don't have data. We're making a push to switch to models trained on open data, such as the Open American National Corpus. More to come on that in the coming weeks.
On Wed, May 2, 2012 at 11:23 AM, Juan Miguel Cejuela <[email protected]>wrote: > Hi, > > in the models list page, it's written that the EN sentence detector uses > opennlp training data. Is it possible to access this training data? Besides > this, which other training corpora are for EN sentence segmentation? > > > Much appreciated > > -- > Juan Miguel Cejuela > -- Jason Baldridge Associate Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
