Hi all,

as proposed earlier I think we should go ahead and define/implement the training parameters format and classes. We need to define the format and decide how we change
our current training implementation.

I believe it should be part of OpenNLP Tools and not the maxent package,
for two reasons, first it should be possible to define parameters for different models, where maxent only deals with one model at a time, and the new API does not depend on
maxent (which will be replaced with opennlp-ml).

The parser contains multiple models, maybe someone wants to train one of them with perceptron and the other with maxent, or experiment with cutoff and iterations
for a certain model.

I propose that we simply use a java properties file.

For the name finder it could look like this:
Algorithm=MAXENT
Iterations=150
Cutoff=4

Or for the parser:
build.Algorithm=MAXENT
build.Iterations=180
build.Threads=4
check.Algorithm=MAXENT
check.Iterations=120
check.Threads=2
tagger.Algorithm=PERCEPTRON
tagger.Iterations=130
tagger.Cutoff=0

The maxent package will provide a small util which can validate the parameters for a certain algorithm
and then do the training according to the parameters.

That could look like this:
isValid(Map<String, String> params);
train(Map<String, String> params, EventStream events)

Depending on the model which should be trained, the Training Parameters can be reduced by
providing a name space.

To train the build model in the sample above the following would be done
TrainingParamters.getParams("build");
that return a Map<String, String> with this content:
Algorithm=MAXENT
Iterations=180
Threads=4
and the map is passed to the train method to train the model based on the provided event stream.

Any opinions ?

I am +1 to do this change for 1.5.2, but we need to maintain strict backward compatibilty.

Jörn

Reply via email to