On 1/12/12 3:42 PM, Svetoslav Marinov wrote:
Hi all,
There is a Perceptron model for Swedish POS tagger. How does one call it with
the API? I checked the API pages as well as the documentation but there there
is only reference to the MaxEnt model:
POSTaggerME tagger = new POSTaggerME(model);
So what is the method for using the Perceptron model?
The decision is made at training time, depending on the settings either
maxent or perceptron is used to train a model. The produced model can
be loaded with the code above and OpenNLP takes care to setup
everything behind the scene correctly.
We distribute a perceptron model for English.
For information about how to set the training algorithm please consult
our documentation:
http://incubator.apache.org/opennlp/documentation/1.5.2-incubating/manual/opennlp.html#tools.postagger.training
I am also curious about the performance of the trained models. Is there any
reference to precision/recall? Can one get in touch with the people who have
trained the models available?
If one creates a new model (say for sentence detection or POS tagging with
different set of POS tags) can one upload it?
We currently don't have a way to share models or take care for the
distribution, mostly for copyright/legal issues.
The way we think it should be fixed is to share open source training data.
Anyway, we have some instructions no how to train the POS tagger on
various public corpora in our documentation.
I suggest that you take a look there:
http://incubator.apache.org/opennlp/documentation/1.5.2-incubating/manual/opennlp.html#tools.corpora
Hope that helps,
Jörn