The default format is one *sentence* per line.
If you have *one word/tag pair* per line that will
still work (not blow up the trainer) but the trained
model will miss all the context features.
Jörn
On 07/27/2012 12:26 PM, Alessandra Donnini wrote:
It helps. Thanks
Il giorno 27/lug/2012, alle ore 09.56, Muhammad Dhito Prihardhanto ha scritto:
I ever trained a POS tagger model for Indonesian language. I defined some
tags for Indonesian words which had some differences with English POS tags.
I also used a 'token_pair' format in sentence list. I didn't provide any
tag dictionary.
And ... that was doing great without problem. I could create an Indonesian
POS tagger model and used it to evaluate some Indonesian text as well.
Hope this can help.
--
Dhito
On Fri, Jul 27, 2012 at 2:27 PM, Alessandra Donnini <[email protected]>wrote:
Ok I know I'm new to opennlp, and my question may be wrong, but I would
like to understand: can anyone answer?
thanks
Alessandra
Inizio messaggio inoltrato:
Da: Alessandra Donnini <[email protected]>
Data: 20 luglio 2012 17.04.27 GMT+02.00
A: [email protected]
Oggetto: Training a POS tagger model
I would like to provide (train) a POS tagger model for italian language.
I have some questions:
- may I use a token_tag pair list in place of sentence list? Something
like:
casa_NOUN
e_CON (conjuction)
...
in place of
la_ART casa_NOUN e_CON la_ART strada_NOUN
...
because I have founded an italian word list.
- Do I need to provide a tag dictionary? Is there a default tag
dictionary?
thanks
Alessandra