I committed it, but it still needs a little fine tuning here and there.
Anyway I think its very interesting for many to see how the perceptron model
performs in our different components.

I did a quick test with the name finder and there I had a higher recall and lower
precision compared to the maxent model. It was also quite a bit faster.

You can now pass in a properties file on the cmd line interface with -params which
could look like this:
Algorithm=PERCEPTRON <- or MAXENT
Iterations=100
Cutoff=0

The above works for the tokenizer, sentence detector, name finder, chunker, pos tagger
and doccat. The porperties file for the parser is slightly more complex.

Jörn

On 5/18/11 12:32 PM, Jörn Kottmann wrote:
On 5/18/11 12:13 PM, Tommaso Teofili wrote:

Sounds good to me, I agree 1.5.2 should be backward compatible, eventually
leaving the "old way" in 1.6.x.

Currently we have a set of train methods which all take the number of iterations and the cutoff. Only the POS Tagger train method has the ability to also take a
model type parameter.

I will add a new set of train methods which take a param object instead of the iterations and cutoff. We could then decide the deprecate all the old train methods.

The cmd line interface can also additionally accept a parameter file, but the iterations
and cutoff arguments will stay in place until 1.6.

I started working on it already and did a couple of tests with the name finder and
perceptron instead of maxent.

Jörn


Reply via email to