Another clarification on this, just in case it is useful:

The overall order of training in the chunking parser is "build,
tagger, chunker and check" and you need to specify each of these steps
as prefixes in a training parameters file. Like this, for example:

Algorithm=MAXENT
build.Iterations=200
tagger.Iterations=200
chunker.Iterations=200
check.Iterations=200
build.Cutoff=4
tagger.Cutoff=4
chunker.Cutoff=4
check.Cutoff=4
build.Threads=4
tagger.Threads=4
chunker.Threads=4
check.Threads=4

Of course, if you insert a better POS model into the chunking-parse
model you can just ignore the tagger parameters, etc.

Cheers,

Rodrigo

On Thu, May 2, 2013 at 4:18 PM, Rodrigo Agerri <[email protected]> wrote:
> Thanks Jörn, that worked.
>
> Just in case anyone is wondering about the 4 steps Jörn mentioned, I
> looked at the chunking/Parser.java code again and found the reference
> to the author of the parsing approach used by the chunker parser
> (based on MaxEnt), whose thesis can be found here:
>
> http://www.ircs.upenn.edu/download/techreports/1998/98-15.pdf
>
> As the first two steps (tag and chunk, in this order) are already
> provided by the training data you can configure the other two (build
> and check, in this order) in the lang/TrainerParams.txt as you
> suggested:
>
> build.Cuttoff=2
> build.Iterations=200
> build.Threads=4
>
> check.Cuttoff=2
> check.Iterations=200
> check.Threads=4
>
> Cheers,
>
> Rodrigo
>
> On Tue, Apr 30, 2013 at 9:46 PM, Joern Kottmann <[email protected]> wrote:
>> Short answer from my phone, instead of Cutoff the parameter name is
>> check.Cutoff=0 for example. I will have a closer look tomorrow and reply on
>> the list, would be nice to have a sample parameter file for the parser be
>> checked in.
>>
>> Cheers Jörn
>>
>> On Apr 30, 2013 7:50 PM, "Rodrigo Agerri" <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> Thanks for your answers, I will explain myself better.
>>>
>>> I edit the lang/TrainerParams.txt file where I specify, for example:
>>>
>>> Algorithm=MAXENT
>>> Iterations=1000
>>> Cutoff=0
>>> Threads=4
>>>
>>> Then I run the ParserTrainer from the CLI:
>>>
>>> bin/opennlp ParserTrainer -headRules
>>> /home/ragerri/experiments/parsing/opennlp/es/data/es-head-rules
>>> -parserType CHUNKING -params lang/TrainerParams.txt -lang es -model
>>> test.bin -encoding UTF-8 -data
>>> /home/ragerri/experiments/parsing/ancora-2.0/ancora2.treebank
>>>
>>> It trains fine, and the model works fine in a system using Apache
>>> OpenNLP API, but it still uses the cutoff 5 and 100 iterations that
>>> seems to be the default specification training parameters for
>>> ParserTrainer.
>>>
>>> I can change these parameters for parser training using the API, that
>>> works fine, but I cannot manage to do it from the command line.
>>>
>>> I did not understand your suggestion, Jörn, could you please provide
>>> an example?
>>>
>>> Thanks,
>>>
>>> Rodrigo
>>>
>>>
>>>
>>> On Tue, Apr 30, 2013 at 4:21 PM, Jörn Kottmann <[email protected]> wrote:
>>> > On 04/30/2013 04:03 PM, William Colen wrote:
>>> >>
>>> >> Are you using the command line tool? If yes, you should pass the path
>>> >> to
>>> >> the parameters file in the command line argument -params <file-path>
>>> >>
>>> >
>>> > The parser trains multiple models, to make the parameters work they are
>>> > prefixed,
>>> > the prefixes for the four models are: tagger, chunker, check and build.
>>> > Just
>>> > put them in front
>>> > of the usual parameter names.
>>> >
>>> > HTH,
>>> > Jörn

Reply via email to