Chris, Unfortunately, most... if not all, of the training data is not FREE or openly available due to copyright. If you would like to start a group to engage in collecting non-copyrighted text and parse the data by hand you are more than welcome and encouraged to do so. Jorn or Jason may have a more complete set of training data and could help if you pass on your samples.
James On 2/13/2011 11:03 PM, Chris Spencer wrote: > Where would we download the source data and tools used to generate the > pretrained models available at > http://opennlp.sourceforge.net/models-1.5/, specifically for the > English Treebank Parser? > > I have a large corpus of hand-corrected sentence/parse-tree pairs, as > well as an extended lexicon, and I'd like to incorporate these into > the training data and retrain a new parser better fitted for my > domain. > > Regards, > Chris
