Michael,

The inability to redistribute training data is a current problem with
retraining and improving models:

https://cwiki.apache.org/OPENNLP/opennlp-annotations.html

Also, see this discussion about "OpenNLP Annotations Proposal" on the
opennlp-dev list:

http://mail-archives.apache.org/mod_mbox/incubator-opennlp-dev/201106.mbox/thread

It might take a little while to get this going, but we're all very keen to
make progress on it!

Jason

On Fri, Jun 10, 2011 at 12:27 PM, Michael Schmitz
<sch...@cs.washington.edu>wrote:

> Hi, I was wondering if the training data for the OpenNLP maxent POS tagger
> models is public and available somewhere.  I would like to train models for
> the pos tagger and the chunker that work on sentences without case (i.e.
> all
> capitalized).  If I had the training data used for en-pos-maxent.bin, a
> first pass would simply mean capitalizing the tokens and running the
> trainer.  It appears that the chunker training data somes from CONLL2000 (
> http://www.cnts.ua.ac.be/conll2000/chunking/).
>
> I would be happy to share the models with OpenNLP if anyone thought they
> would be of use to others.
>
> Peace.  Michael
>



-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Reply via email to