Hi, I was wondering if the training data for the OpenNLP maxent POS tagger
models is public and available somewhere.  I would like to train models for
the pos tagger and the chunker that work on sentences without case (i.e. all
capitalized).  If I had the training data used for en-pos-maxent.bin, a
first pass would simply mean capitalizing the tokens and running the
trainer.  It appears that the chunker training data somes from CONLL2000 (
http://www.cnts.ua.ac.be/conll2000/chunking/).

I would be happy to share the models with OpenNLP if anyone thought they
would be of use to others.

Peace.  Michael

Reply via email to