I committed it, but it still needs a little fine tuning here and there.
Anyway I think its very interesting for many to see how the perceptron model
performs in our different components.
I did a quick test with the name finder and there I had a higher recall
and lower
precision compared to the maxent model. It was also quite a bit faster.
You can now pass in a properties file on the cmd line interface with
-params which
could look like this:
Algorithm=PERCEPTRON <- or MAXENT
Iterations=100
Cutoff=0
The above works for the tokenizer, sentence detector, name finder,
chunker, pos tagger
and doccat. The porperties file for the parser is slightly more complex.
Jörn
On 5/18/11 12:32 PM, Jörn Kottmann wrote:
On 5/18/11 12:13 PM, Tommaso Teofili wrote:
Sounds good to me, I agree 1.5.2 should be backward compatible,
eventually
leaving the "old way" in 1.6.x.
Currently we have a set of train methods which all take the number of
iterations
and the cutoff. Only the POS Tagger train method has the ability to
also take a
model type parameter.
I will add a new set of train methods which take a param object
instead of the
iterations and cutoff. We could then decide the deprecate all the old
train methods.
The cmd line interface can also additionally accept a parameter file,
but the iterations
and cutoff arguments will stay in place until 1.6.
I started working on it already and did a couple of tests with the
name finder and
perceptron instead of maxent.
Jörn