Re: problem in training a new portuguese model

Jim - FooBar(); Thu, 10 Jan 2013 14:11:50 -0800

Your data doesn't look right...It should be like this: (from the docs)


About_IN 10_CD Euro_NNP ,_, I_PRP reckon_VBP ._.
That_DT sounds_VBZ good_JJ ._.

1 sentence per line of course...

apart from this the message you're getting suggests that you're notusing the command correctly and you're being redirected to the helpmessage. However the command you posted seems correct to me! strange...

I'm not sure if this helps you...did you try using the API?Do you knowJava or some other jvm-hosted language?


Jim

On 10/01/13 20:31, Leonel de Alencar wrote:


  Hi!

I'm trying to train a new Brazilian Portuguese tagger model from the MAC-Morpho 
corpus.
I used the command below, but no model was created. Instead, I just got a usage 
message.

opennlp POSTaggerTrainer -lang pt -model-type maxent -encoding utf-8 -data 
pt-pos.train -model pt-pos-maxent.bin
Usage: opennlp POSTaggerTrainer [-type maxent|perceptron|perceptron_sequence] 
[-dict dictionaryPath] [-ngram cutoff] [-paramsparamsFile] -lang language 
[-cutoff num] [-iterations num] [-encoding charsetName] -data trainData -model 
modelFile

Arguments description:
         -type maxent|perceptron|perceptron_sequence
                 The type of the token name finder model. One of 
maxent|perceptron|perceptron_sequence.
         -dict dictionaryPath
                 The XML tag dictionary file
         -ngram cutoff
                 NGram cutoff. If not specified will not create ngram 
dictionary.
         -paramsparamsFile
                 Training parameters file.
         -lang language
                 specifies the language which is being processed.
         -cutoff num
                 specifies the min number of times a feature must be seen. It 
is ignored if a parameters file is passed.
         -iterations num
                 specifies the number of training iterations. It is ignored if 
a parameters file is passed.
         -encoding charsetName
                 specifies the encoding which should be used for reading and 
writing text. If not specified the system default will be used.
         -data trainData
                 the data to be used during training
         -model modelFile
                 the output model file

Here is an excerpt from the pt-pos.train file:

Jersei_N atinge_V média_N de_PREP Cr$_CUR 1,4_NUMmilhão_N em_PREP|+ a_ART 
venda_N de_PREP|+ a_ART Pinhal_NPROP em_PREP São_NPROP Paulo_NPROP ._.

Programe_V sua_PROADJviagem_N a_PREP|+ a_ART Exposição_NPROPNacional_NPROP 
do_NPROP Zebu_NPROP ,_, que_PRO-KS-REL começa_V dia_N 25_N|AP ._.

Safra_N recorde_ADJ e_KC disponibilidade_N de_PREP crédito_N ativam_V vendas_N 
de_PREP máquinas_N agrícolas_ADJ ._.

A_ART degradação_N de_PREP|+ as_ART terras_N por_PREP|+ o_ART mau_ADJ uso_N 
de_PREP|+ os_ART solos_N avança_V em_PREP|+ o_ART ._.

A_ART desertificação_N tornou_V crítica_ADJ a_ART produtividade_N de_PREP 
52_NUM mil_NUM km²_N em_PREP|+ a_ART região_N ._.



I would appreciate if someone could help me!

Best,
Leonel

Re: problem in training a new portuguese model

Reply via email to