Your data doesn't look right...It should be like this: (from the docs)
About_IN 10_CD Euro_NNP ,_, I_PRP reckon_VBP ._.
That_DT sounds_VBZ good_JJ ._.
1 sentence per line of course...
apart from this the message you're getting suggests that you're not
using the command correctly and you're being redirected to the help
message. However the command you posted seems correct to me! strange...
I'm not sure if this helps you...did you try using the API?Do you know
Java or some other jvm-hosted language?
Jim
On 10/01/13 20:31, Leonel de Alencar wrote:
Hi!
I'm trying to train a new Brazilian Portuguese tagger model from the MAC-Morpho
corpus.
I used the command below, but no model was created. Instead, I just got a usage
message.
opennlp POSTaggerTrainer -lang pt -model-type maxent -encoding utf-8 -data
pt-pos.train -model pt-pos-maxent.bin
Usage: opennlp POSTaggerTrainer [-type maxent|perceptron|perceptron_sequence]
[-dict dictionaryPath] [-ngram cutoff] [-paramsparamsFile] -lang language
[-cutoff num] [-iterations num] [-encoding charsetName] -data trainData -model
modelFile
Arguments description:
-type maxent|perceptron|perceptron_sequence
The type of the token name finder model. One of
maxent|perceptron|perceptron_sequence.
-dict dictionaryPath
The XML tag dictionary file
-ngram cutoff
NGram cutoff. If not specified will not create ngram
dictionary.
-paramsparamsFile
Training parameters file.
-lang language
specifies the language which is being processed.
-cutoff num
specifies the min number of times a feature must be seen. It
is ignored if a parameters file is passed.
-iterations num
specifies the number of training iterations. It is ignored if
a parameters file is passed.
-encoding charsetName
specifies the encoding which should be used for reading and
writing text. If not specified the system default will be used.
-data trainData
the data to be used during training
-model modelFile
the output model file
Here is an excerpt from the pt-pos.train file:
Jersei_N atinge_V média_N de_PREP Cr$_CUR 1,4_NUMmilhão_N em_PREP|+ a_ART
venda_N de_PREP|+ a_ART Pinhal_NPROP em_PREP São_NPROP Paulo_NPROP ._.
Programe_V sua_PROADJviagem_N a_PREP|+ a_ART Exposição_NPROPNacional_NPROP
do_NPROP Zebu_NPROP ,_, que_PRO-KS-REL começa_V dia_N 25_N|AP ._.
Safra_N recorde_ADJ e_KC disponibilidade_N de_PREP crédito_N ativam_V vendas_N
de_PREP máquinas_N agrícolas_ADJ ._.
A_ART degradação_N de_PREP|+ as_ART terras_N por_PREP|+ o_ART mau_ADJ uso_N
de_PREP|+ os_ART solos_N avança_V em_PREP|+ o_ART ._.
A_ART desertificação_N tornou_V crítica_ADJ a_ART produtividade_N de_PREP
52_NUM mil_NUM km²_N em_PREP|+ a_ART região_N ._.
I would appreciate if someone could help me!
Best,
Leonel