The default format is one *sentence* per line.
If you have *one word/tag pair* per line that will
still work (not blow up the trainer) but the trained
model will miss all the context features.

Jörn

On 07/27/2012 12:26 PM, Alessandra Donnini wrote:
It helps. Thanks



Il giorno 27/lug/2012, alle ore 09.56, Muhammad Dhito Prihardhanto ha scritto:

I ever trained a POS tagger model for Indonesian language. I defined some
tags for Indonesian words which had some differences with English POS tags.

I also used a 'token_pair' format in sentence list. I didn't provide any
tag dictionary.

And ... that was doing great without problem. I could create an Indonesian
POS tagger model and used it to evaluate some Indonesian text as well.

Hope this can help.

--
Dhito

On Fri, Jul 27, 2012 at 2:27 PM, Alessandra Donnini <[email protected]>wrote:

Ok I know I'm new to opennlp, and my question may be wrong, but I would
like to understand: can anyone answer?
thanks
Alessandra

Inizio messaggio inoltrato:

Da: Alessandra Donnini <[email protected]>
Data: 20 luglio 2012 17.04.27 GMT+02.00
A: [email protected]
Oggetto: Training a POS tagger model

I would like to provide (train) a POS tagger model for italian language.
I have some questions:
- may I use a token_tag pair list in place of sentence list? Something
like:
casa_NOUN
e_CON (conjuction)
...
in place of

la_ART casa_NOUN e_CON la_ART strada_NOUN
...
because I have founded an italian word list.

- Do I need to provide a tag dictionary? Is there a default tag
dictionary?
thanks
Alessandra




Reply via email to