On 10/01/2013 05:01 PM, Michael Schmitz wrote:
Hi, I've used OpenNLP for a few years--in particular the chunker, POS
tagger, and tokenizer. We're grateful for a high performance library
with an Apache license, but one of our greatest complaints is the
quality of the models. Yes--we're aware we can train our own--but
most people are looking for something that is good enough out of the
box (we aim for this with out products). I'm not surprised that
volunteer engineers don't want to spend their time annotating data;-)
OpenNLP addressed this issue partly with the formats package, quite some
existing corpora can now be used to create OpenNLP models, I personally
don't
feel its worth going through all the license issues to redistribute
these models,
and rather think we should make sure they can be created easily with the
trainer
tools we have.
Jörn